Address review feedback on #2896#2921
Conversation
- Fix stale docs that still reference cagent/* meta keys; rename to
docker-agent/* to match what the runtime emits. Also rename 'legacy'
to 'client-driven' in docs for the mode that does not set
--mcp-oauth-redirect-uri.
- Drop the tautological state check on the deeplink-callback path in
handleUnmanagedOAuthFlow. Synthesizing {code, state=expectedState}
for consumeUnmanagedElicitationReply meant the ConstantTimeCompare
guard was always true and could mislead future readers. The
pending-OAuth registry lookup IS the state validation for that
path; state validation now lives only in the elicitation-reply
branch where state actually arrives from the client. Extract a
small exchangeAuthorizationCode helper that both paths share for
the actual /token exchange.
- Bound the OAuth flow's wait with a 10 minute deadline. Without it,
a silent MCP-client disconnect (TCP RST, idle timeout, process
kill) leaves requestElicitation blocked forever and the per-session
streaming lock held indefinitely, returning 409 / ErrSessionBusy on
every subsequent message until a process restart. The deadline
applies via context.WithTimeout on elicCtx; an explicit case in
the outer select also fires as a defensive fallback if
requestElicitation ever ignores its ctx. New test
TestUnmanagedOAuthFlow_DriveFlow_TimesOutWhenNoReplyArrives
exercises the path with a 200ms override.
- Document the threat model on /api/mcp-oauth/callback in the
handler doc-comment: the registry is the per-request authorization,
but if a state value leaks (debug logs, compromised host) an
attacker could submit a code; --auth-token blocks this regardless
of state leakage and SHOULD be set when the agent listens on a
network-reachable interface.
Signed-off-by: Trung Nguyen <trung.nguyen@docker.com>
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟢 APPROVE
Reviewed the OAuth refactoring changes in this PR. The diff is clean — no bugs found in the added code.
What was checked:
-
exchangeAuthorizationCodeextraction: Correctly splits the token-exchange step from state validation. The elicitation path retains itssubtle.ConstantTimeComparestate check; the deeplink-callback path legitimately skips it (state was already validated by the pending-OAuth registry lookup). No tautology remains, and no validation gap is introduced. -
Timeout / deadline logic:
context.WithTimeout(ctx, unmanagedOAuthWaitTimeout)correctly bounds the wait. Thedefer elicCancel()ensures the goroutine's context is cancelled on all return paths. The elicitation goroutine writes to a buffered channel (capacity 1) so it cannot block afterhandleUnmanagedOAuthFlowreturns — no goroutine leak. -
Defensive
<-elicCtx.Done()select case: Provides a correct safety net for implementations ofrequestElicitationthat do not honour context cancellation. In practice theelicChcase fires first with a wrappedDeadlineExceeded; the defensive case only activates if the implementation ignores its ctx. -
niltoken after select: BothcallbackChandelicChcases only leavetoken == nilwhenconsumeErr != nil, so the early return onconsumeErr != nilis always taken beforetokenwould be dereferenced. -
resourceMetadataremoval: The param was previously unused (_ = resourceMetadata). TheresourceIndicatorstring (the actual token-exchange parameter derived from it) is still threaded through correctly toexchangeAuthorizationCode. -
Test:
TestUnmanagedOAuthFlow_DriveFlow_TimesOutWhenNoReplyArrivescorrectly exercises the deadline path. The globalunmanagedOAuthWaitTimeoutoverride is safe because the test does not callt.Parallel(). -
Docs:
cagent/*→docker-agent/*renames are consistent throughout the updated table and prose.
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟢 APPROVE
The refactor is structurally sound. The exchangeAuthorizationCode extraction is correct — the elicitation path retains its ConstantTimeCompare state validation, while the callback path legitimately skips it because the pending-OAuth registry lookup (unknown states return 404) is the equivalent check. The buffered elicCh prevents goroutine leaks in all select arms. The elicCtx.Done() timeout guard correctly races with elicCh and both outcomes produce well-handled errors. No high or medium severity issues found — two minor defensive notes left inline.
| ) | ||
| if err != nil { | ||
| return err | ||
| if consumeErr != nil { |
There was a problem hiding this comment.
[LOW] Defensive nil-token guard missing after exchangeAuthorizationCode
After both select arms (callbackCh and elicCh) complete, the post-select block only checks consumeErr != nil before dereferencing token (e.g. token.ClientID = clientID a few lines below). If exchangeAuthorizationCode were ever to return (nil, nil) — a theoretical edge case if exchangeCodeForToken's contract ever changes — execution would reach token.ClientID = ... with a nil pointer and panic.
A one-line defensive check after the error guard would eliminate this latent risk:
if consumeErr != nil {
return consumeErr
}
if token == nil {
return errors.New("internal: exchangeAuthorizationCode returned nil token without error")
}This is low risk given exchangeCodeForToken's current implementation always returns an error on nil token, but the guard is cheap and makes the invariant explicit.
| // would return 409 / ErrSessionBusy until a process restart. | ||
| func TestUnmanagedOAuthFlow_DriveFlow_TimesOutWhenNoReplyArrives(t *testing.T) { | ||
| original := unmanagedOAuthWaitTimeout | ||
| unmanagedOAuthWaitTimeout = 200 * time.Millisecond |
There was a problem hiding this comment.
[LOW] Package-level test variable mutation is fragile if tests are ever parallelized
unmanagedOAuthWaitTimeout is a package-level var overridden directly in the test body:
unmanagedOAuthWaitTimeout = 200 * time.Millisecond
t.Cleanup(func() { unmanagedOAuthWaitTimeout = original })This is safe today because the test does not call t.Parallel(). However, if a future contributor adds t.Parallel() to this test or any sibling test that reads unmanagedOAuthWaitTimeout, a data race is introduced and the cleanup's restore ordering becomes non-deterministic.
Consider a concurrency-safe alternative — for example, an oauthWaitTimeout field on oauthTransport (defaulting to the package constant) so each test instance sets its own value without touching shared state.
…ect-uri flag (refs docker#2896, docker#2921)
Why
#2896 introduced the in-process unmanaged OAuth drive-flow and the out-of-band callback route. The PR landed before review feedback (from the bot and from @dgageot) could be folded in. This PR addresses each point.
Changes
1. Docs —
cagent/*→docker-agent/*(per @dgageot)docs/features/remote-mcp/index.mdstill referenced the legacy meta-key names (cagent/type,cagent/server_url,cagent/authorize_url,cagent/state) while the runtime emitsdocker-agent/*. Embedders following the docs would have looked up the wrong keys. Renamed in-place. Also renamed "legacy" → "client-driven" in the same file for the mode that does not set--mcp-oauth-redirect-uri, matching the PR-body terminology.2. Drop the tautological state check on the deeplink-callback path (bot, MEDIUM)
On the direct-callback path,
handleUnmanagedOAuthFlowwas synthesising the payloadconsumeUnmanagedElicitationReplyexpects:Then
consumeUnmanagedElicitationReplydid:Always true by construction — the guard was dead code on this path and gave a false sense of security: any reviewer would assume the state arriving over HTTP was re-validated here, when it was not.
Fix: extract a
exchangeAuthorizationCodehelper that performs only the/tokenexchange. The deeplink path calls it directly withcb.Code(state was already validated by the pending-OAuth registry lookup — only states the runtime generated and is currently awaiting are present, unknown ones return 404). The elicitation-reply path keeps its state check, because there state actually arrives from the client and is independent ofexpectedState.3. Bound the OAuth flow's wait with a deadline (bot, MEDIUM)
handleUnmanagedOAuthFlowselects across:ctx.Done()ctxis theWithoutCancel-detached ctx fromclientConnector.ConnectuserCancelChcallbackCh/api/mcp-oauth/callbackelicChrequestElicitationreturnsIf the MCP client disconnects silently (TCP RST, idle timeout, process kill) and
requestElicitationdoes not return promptly, the OAuth flow blocks forever. The per-session streaming lock at theSessionManagerlevel is held until the next process restart; every subsequent message from that session returns409 Conflict / ErrSessionBusy.Fix: scope the elicitation goroutine to an
elicCtxwith a 10-minute deadline (unmanagedOAuthWaitTimeout). When the deadline fires,requestElicitationreturns a wrappedcontext.DeadlineExceeded, theelicChcase picks it up, the OAuth flow returns, and the lock releases. An explicit<-elicCtx.Done()case in the outer select serves as a defensive fallback in caserequestElicitationever ignores its ctx. New testTestUnmanagedOAuthFlow_DriveFlow_TimesOutWhenNoReplyArrivesexercises this with a 200ms override.10 minutes is generous enough for a user to click through an IdP consent screen and any IdP-side prompts (MFA, recovery codes, …); small enough that a dead session can't strand the lock. Open to making it configurable via
RuntimeConfigif needed.4. Document the threat model on
POST /api/mcp-oauth/callback(bot, MEDIUM)The route inherits the same auth posture as the rest of the API: when
--auth-tokenis unset, it's open. The registry's lookup-by-state IS meaningful authorization for the legitimate request (only states the runtime generated and is awaiting are accepted), but state values do appear in transit:Metaon the session SSE stream (visible to the connected client)--debugis on)An attacker who observes a live state via any of these channels could POST here with an attacker-controlled
code; the resulting token would be bound to the attacker's account. Setting--auth-tokenblocks this regardless of state leakage.The handler's doc-comment now spells out the threat model and recommends
--auth-tokenfor deployments that listen on a network-reachable interface. No code change was made: this route is no more exposed than any other API endpoint under the same group, and operators already make the--auth-tokenchoice once for the whole API.