[api][routing] Pluggable in-chat LLM routing (ChatModelRouter + RoutingStrategy) by purushah · Pull Request #852 · apache/flink-agents

purushah · 2026-06-15T20:37:13Z

What is the purpose of the change

Adds a drop-in chat model that routes each request to the best underlying model, then delegates to it. The router is a CHAT_MODEL resource, so an agent points at it by name with no change to the runtime, events, or agent definition.

This is the in-chat selector (which LLM serves a single chat() call). A DataStream-level content-based agent router (branching records across agent operators) is a separate, follow-up concern.

Brief change log

RoutingStrategy — pluggable selection SPI (request -> candidate name). Selection is a pure concern; returning null means "abstain / no opinion".
ChatModelRouter — orchestrates select → (optional cache) → validate → delegate. A strategy that abstains (null) or names a non-candidate is a routing miss and degrades to the configured default candidate (validated at construction; defaults to the first candidate) rather than failing the request.
FallbackPolicy — optional: try remaining candidates on error.
CachingStrategy — optional bounded-LRU memoization of the decision per conversation, so an expensive strategy (e.g. an LLM judge) runs once per conversation, not once per tool-call round. Abstentions (null) are never cached.
Built-in strategies:
- RuleBasedRoutingStrategy — deterministic keyword/regex rules + default.
- LlmRoutingStrategy — a small "judge" model picks the candidate from each candidate's name/description (RouteLLM-style). Distinguishes a transient judge failure (abstain → retried next round, uncached) from an unparseable reply (deterministic default). Parses by whole-token match (no substring mis-routing, e.g. gpt-4o-mini won't match a gpt-4 candidate).
Bring-your-own strategies are first-class: implement RoutingStrategy and reference it by fully-qualified class name; loaded via the thread context classloader (cluster-safe). ML/learned routing is supported the same way.
Adds LlmRoutingAgentExample and unit tests.

Verifying this change

This change adds tests and can be verified as follows:

Unit tests under api/.../chat/model/routing/ covering rule selection, judge parsing (whole-token match), stickiness across tool-call rounds, fallback, caching (incl. abstain-not-cached), routing-miss degrade-to-default, and bring-your-own loading. All pass; spotless:check clean (JDK 17).

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): no
The public API: yes — adds the org.apache.flink.agents.api.chat.model.routing package (additive; no existing API changed).
The serializers: no
The runtime per-record code paths: no (router is a CHAT_MODEL resource resolved by name)
Anything that affects deployment or recovery: no — preserves exactly-once / keyed-state / checkpoint semantics (no new operator, no nested invocation).

Security note

An LLM/ML routing decision is a hint, not an authority — the user's message is sent to the judge model, so a routing decision is susceptible to prompt injection. Cost/privilege/safety controls must not be gated solely on it. This is documented on LlmRoutingStrategy.

Documentation

New public package is documented via javadoc on each type. Built-in strategies, the abstain/routing-miss contract, and the bring-your-own extension point are described on the SPI.

Documentation

doc-needed
doc-not-needed
doc-included

…ngStrategy) Add a drop-in chat model that selects which underlying model serves each request, then delegates to it. The router is a CHAT_MODEL resource, so an agent points at it by name with no runtime, event, or agent-definition change. Selection is a pluggable SPI (`RoutingStrategy`), decomposed into orthogonal concerns: - RoutingStrategy — pure selection (request -> candidate name). Returning null means "abstain / no opinion". - FallbackPolicy — optional: try remaining candidates on error. - CachingStrategy — optional bounded-LRU memoization of the decision per conversation, so an expensive strategy (e.g. an LLM judge) runs once per conversation rather than once per tool-call round. Built-in strategies: - RuleBasedRoutingStrategy — deterministic keyword/regex rules + default. - LlmRoutingStrategy — a small "judge" model picks the candidate from each candidate's name/description (RouteLLM-style). Bring-your-own strategies are first-class: implement RoutingStrategy and reference it by fully-qualified class name; loaded via the thread context classloader (cluster-safe). ML/learned routing is supported the same way. Routing-miss semantics: a strategy that abstains (null) or names a non-candidate degrades to the configured `default` candidate (validated at construction; defaults to the first candidate) rather than failing the request. The LLM judge distinguishes a transient failure (abstain -> not cached, retried next round) from an unparseable reply (deterministic default). Security: an LLM/ML routing decision is a hint, not an authority — the user's message is sent to the judge, so cost/privilege/safety must not be gated solely on it (prompt-injection risk). This is documented on the strategy. Includes an example (LlmRoutingAgentExample) and unit tests covering rule selection, judge parsing (whole-token match, no substring mis-routing), stickiness, fallback, caching (incl. abstain-not-cached), and bring-your-own. Also mirror the RULE_BASED/LLM ResourceName constants on the Python side (ResourceName.RoutingStrategy.Java) and register RoutingStrategy in the cross-language ResourceName parity check.

weiqingy

Thanks for taking this on, @purushah. A few questions inline.

weiqingy · 2026-06-16T23:25:12Z

+     * which resolves its own). Override to skip the base class's connection resolution.
+     */
+    @Override
+    public void open() {


Worth calling out the design here: the three-way split into selection / resilience / caching (RoutingStrategy / FallbackPolicy / CachingStrategy) lands the right seams, and the drop-in CHAT_MODEL story holds up against the real ChatModelAction + ResourceCache path. One non-obvious correctness point stood out at this open(): the no-op is safe precisely because a routed candidate is lazily open()-ed by ResourceCache.getResource() when it's resolved, so its connection is non-null before chat() runs. Documenting that invariant in the Javadoc rather than leaving it implicit is exactly right — it's the kind of thing that bites a future maintainer if it only lives in someone's head.

weiqingy · 2026-06-16T23:25:12Z

+            // decision. Recompute each time instead.
+            return delegate.route(context);
+        }
+        String cached = cache.get(key);


The check-then-act here is non-atomic: cache.get(key) … delegate.route(context) … cache.put(key, chosen). Two async-pool threads handling the same conversation key on its first touch can both miss the cache and both invoke delegate.route() — i.e. two judge-model calls for the same conversation. The map itself is synchronizedMap, so there's no corruption, and the redundant compute is benign (last-writer-wins on the same key), so I wouldn't add locking for it — the cost isn't worth it. My only question is about the wording: the class Javadoc here and the LlmRoutingStrategy Javadoc ("the judge runs once per conversation") read as a hard guarantee, where this is really best-effort memoization. Would it be worth softening to "typically once per conversation" (or a one-line note that a concurrent first-touch can double-compute), so the contract matches the behavior? Or is the stronger phrasing intentional?

weiqingy · 2026-06-16T23:25:12Z

+
+    /** A {@link ResourceContext} backed by a fixed name → resource map. */
+    static ResourceContext context(Map<String, Resource> byName) {
+        return ResourceContext.fromGetResource(


The fakes resolve through ResourceContext.fromGetResource(...), which returns the registered resource directly — unlike the runtime ResourceCache.getResource(), which calls open() on the resolved resource before handing it back (ResourceCache.java:125). The upshot is that the load-bearing production invariant — a routed candidate gets open()-ed, so its connection is non-null before chat() — isn't exercised by any unit test; it rests on the ResourceCache path plus the e2e example. That's a structural property of resolving fakes without a real cache, not a defect in this PR. Does the e2e LlmRoutingAgentExample exercise the real-candidate (open()-then-chat()) path in CI, so the invariant is covered somewhere? If it doesn't, would a single router test backed by a real ResourceCache and a candidate that actually needs open() be worth adding, to pin the one fact the whole drop-in design depends on?

weiqingy · 2026-06-16T23:25:12Z

            # Milvus
            MILVUS_VECTOR_STORE = "org.apache.flink.agents.integrations.vectorstores.milvus.MilvusVectorStore"

+    class RoutingStrategy:


This is the only Python surface the feature gets — the mirrored RoutingStrategy.Java.{RULE_BASED, LLM} FQCN constants, with no Python ChatModelRouter, no Python RoutingStrategy SPI, and no Python tests. CLAUDE.md asks for Java/Python parity, though that's scoped to "when changing shared logic," and this PR adds a new Java resource resolved by name rather than touching shared logic. The mirrored constants read as a deliberate bridge — a Python agent points a CHAT_MODEL at the Java router by name. Is that the intended Python story for v1 ("reference the Java router by name," with a native Python SPI as a possible follow-up), or is Python routing out of scope for now? Asking mainly so the intent is on the record — the constants imply Python users are expected to reach this, so it'd help to state how far that's meant to go.

weiqingy · 2026-06-16T23:25:12Z

+    private final Map<String, Object> metadata;
+
+    public RoutingCandidate(String name, String description, Map<String, Object> metadata) {
+        this.name = Objects.requireNonNull(name, "candidate name must not be null");


The ctor null-checks name but lets an empty string through, and in LlmRoutingStrategy.parseChoice the whole-token regex for an empty quoted name over-matches almost any boundary → mis-route. It takes a pathological config (an empty candidate name) to hit, so low priority — a one-line name.isEmpty() guard alongside the requireNonNull closes it cleanly.

@weiqingy

…at test Review follow-ups from @weiqingy on the routing PR: - ChatModelRouter.open(): document the load-bearing invariant the no-op relies on — a routed candidate is lazily open()-ed by ResourceCache.getResource() on first resolution, so its connection is non-null before chat() runs. - CachingStrategy / LlmRoutingStrategy: soften "runs once per conversation" to "typically once" and document that memoization is best-effort (a concurrent first-touch on the same key can double-compute; synchronized map, last-writer- wins, benign — so no locking). - RoutingCandidate: reject an empty name (not just null) — an empty name has no resolvable resource and would make LlmRoutingStrategy.parseChoice's whole-token match over-match arbitrary boundaries (mis-route). - Tests: add ChatModelRouterTest cases pinning the open-before-chat invariant (candidate resolved through an opening ResourceContext, mirroring ResourceCache; plus the negative case proving it is load-bearing), and RoutingCandidateTest for the null/empty name guards. 39 routing tests pass; spotless:check clean under JDK 17.

github-actions Bot added doc-label-missing The Bot applies this label either because none or multiple labels were provided. fixVersion/0.3.0 The feature or bug should be implemented/fixed in the 0.3.0 version. priority/major Default priority of the PR or issue. labels Jun 15, 2026

purushah force-pushed the routing-pr branch from b69af3b to e4e62ed Compare June 15, 2026 21:53

github-actions Bot added doc-needed Your PR changes impact docs. and removed doc-label-missing The Bot applies this label either because none or multiple labels were provided. labels Jun 15, 2026

weiqingy reviewed Jun 16, 2026

View reviewed changes

wenjin272 added fixVersion/0.4.0 and removed fixVersion/0.3.0 The feature or bug should be implemented/fixed in the 0.3.0 version. labels Jun 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[api][routing] Pluggable in-chat LLM routing (ChatModelRouter + RoutingStrategy)#852

[api][routing] Pluggable in-chat LLM routing (ChatModelRouter + RoutingStrategy)#852
purushah wants to merge 2 commits into
apache:mainfrom
purushah:routing-pr

purushah commented Jun 15, 2026 •

edited

Loading

Uh oh!

weiqingy left a comment

Uh oh!

weiqingy Jun 16, 2026

Uh oh!

weiqingy Jun 16, 2026

Uh oh!

weiqingy Jun 16, 2026

Uh oh!

weiqingy Jun 16, 2026

Uh oh!

weiqingy Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

purushah commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Security note

Documentation

Documentation

Uh oh!

weiqingy left a comment

Choose a reason for hiding this comment

Uh oh!

weiqingy Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

weiqingy Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

weiqingy Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

weiqingy Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

weiqingy Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

purushah commented Jun 15, 2026 •

edited

Loading