Skip to content

Propagate user-initiated cancellation across the WithoutCancel boundary#2916

Closed
trungutt wants to merge 4 commits into
docker:mainfrom
trungutt:propagate-ctx-cancellation-across-without-cancel
Closed

Propagate user-initiated cancellation across the WithoutCancel boundary#2916
trungutt wants to merge 4 commits into
docker:mainfrom
trungutt:propagate-ctx-cancellation-across-without-cancel

Conversation

@trungutt
Copy link
Copy Markdown
Contributor

Depends on

This PR is stacked on top of #2896 (extend-unmanaged-oauth-flow). The two predecessor commits (c97124d0f, a93ae4d3c) shown in the diff belong to #2896; review only the last commit in this PR. Will be rebased onto a clean main after #2896 (or its replacement) merges.

Problem

clientConnector.Connect detaches the caller's ctx with context.WithoutCancel before sending the MCP initialize handshake. The rationale is correct: an MCP toolset's session must outlive any single request — otherwise the underlying SSE/stdio connection tears down when the request that triggered toolset.Start returns, and a successful OAuth flow gets killed the moment its triggering request completes.

But that detachment also severs user-initiated cancellation, which the OAuth flow inside Connect DOES need to observe. Concretely:

handleUnmanagedOAuthFlow blocks on a select waiting for an elicitation reply or an out-of-band callback. If the embedder aborts the in-progress agent run while the user is in the OAuth dialog, the streamCtx the SessionManager derived from the request ctx gets cancelled — but every ctx in the chain BELOW Connect's WithoutCancel boundary stays alive. The OAuth select then blocks forever, the per-session streaming lock is never released, and the next user message returns 409 Conflict from RunSession.ErrSessionBusy.

Fix

Preserve the connection-longevity invariant and restore user-initiated cancellation by stashing the caller's original ctx as a value on the detached ctx. The detached ctx remains the primary driver of the connection's lifetime; operations that want to observe user-initiated cancellation can opt in by reading the parent back out and selecting on its Done channel alongside the local ctx.

Carrying ctx-as-value is unusual but appropriate here: the parent is metadata about how the detached ctx was constructed, not data the request is operating on. The helpers (withCancellableParent / cancellableParentFromContext) live next to the connector that creates the relationship so the rationale stays close to the code.

handleUnmanagedOAuthFlow now extends its select with a userCancelCh case keyed on the parent's Done channel. When the parent is nil (e.g. unit tests, CLI invocations that don't go through clientConnector.Connect), userCancelCh stays nil and the select silently never picks it — preserving back-compat.

Plumbing

  • pkg/tools/mcp/cancellable_parent.go — new helper file with withCancellableParent + cancellableParentFromContext, extensively commented because ctx-as-value is unusual.
  • pkg/tools/mcp/mcp.goclientConnector.Connect captures the caller's ctx before WithoutCancel and attaches it via withCancellableParent.
  • pkg/tools/mcp/oauth.gohandleUnmanagedOAuthFlow extracts the parent and adds a fourth select case watching its Done channel; returns parent.Err() so the agent loop sees a cancellable error and the SessionManager goroutine's deferred Unlock fires.

Tests

  • cancellable_parent_test.go — round-trip, absent-returns-nil, nil-parent-no-op, and the key invariant that the local ctx stays independent of the parent's cancellation (only opt-in callers see it).
  • oauth_test.goTestUnmanagedOAuthFlow_DriveFlow_AbortsOnParentCtxCancellation asserts the new branch fires under the exact boundary clientConnector.Connect sets up, returns the parent's ctx error, and skips the token exchange.

trungutt added 3 commits May 26, 2026 17:51
Adds a new sub-behavior to the unmanaged OAuth flow that lets the
runtime own PKCE / state / DCR / token exchange while the connected
client acts as a thin courier (open browser, receive deeplink,
forward {code, state}).

Activated by a new server-level flag --mcp-oauth-redirect-uri (also
exposed on the runtime as WithUnmanagedOAuthRedirectURI and on the
RuntimeConfig as MCPOAuthRedirectURI). When set, the runtime:

  - generates state + PKCE in-process
  - runs DCR if needed (or uses per-toolset explicit credentials)
  - builds the full authorize URL with the configured redirect_uri
  - emits an elicitation whose Meta carries cagent/authorize_url +
    cagent/state alongside the existing auth_server_metadata
  - accepts {code, state} as a ResumeElicitation payload, verifies
    state in constant time, and exchanges the code at the token
    endpoint using the same redirect_uri (RFC 6749 §4.1.3 binding)
  - stores the resulting token in the keychain with client_id /
    client_secret stamped on for later silent refresh

When the flag is empty, the existing client-driven contract is
preserved verbatim: the elicitation carries only metadata and the
client is expected to reply with {access_token, refresh_token, …}.
The {access_token, …} reply shape is also accepted on the new path
so a client that prefers to do the exchange itself is still free to.

A per-toolset RemoteOAuthConfig.CallbackRedirectURL overrides the
runtime-wide flag for that toolset.

Docker-agent never learns anything about the URL it advertises: it
is opaque, and what happens at the URL (HTML bouncer, OS deeplink,
universal-link claim, …) is the host's concern.

Plumbing:

  - cmd/root/flags.go: add --mcp-oauth-redirect-uri
  - pkg/config/runtime.go: MCPOAuthRedirectURI field
  - pkg/runtime/runtime.go: unmanagedOAuthRedirectURI field +
    WithUnmanagedOAuthRedirectURI Opt
  - pkg/runtime/loop.go: pass through to ConfigureHandlers
  - pkg/server/session_manager.go: forward from runtime config
  - pkg/tools/capabilities.go: extend OAuthCapable interface +
    ConfigureHandlers
  - pkg/tools/mcp/mcp.go, remote.go, session_client.go,
    builtin/mcpcatalog: implement the new SetUnmanagedOAuthRedirectURI
  - pkg/tools/mcp/oauth.go: rework handleUnmanagedOAuthFlow,
    factor out resolveClientCredentials helper and
    consumeUnmanagedElicitationReply
  - docs/features/remote-mcp/index.md: new 'Unmanaged OAuth flow'
    section documenting the meta keys and behavior

Tests: six new oauth_test.go tests covering the drive-flow happy
path (incl. asserting RFC 6749 §4.1.3 redirect_uri parity at /token),
state-mismatch rejection, legacy access-token reply on the new path,
legacy mode shape (no authorize_url emitted), legacy mode rejection
of {code, state}, and the per-toolset > runtime-wide precedence.

Signed-off-by: Trung Nguyen <trung.nguyen@docker.com>
Builds on the unmanaged-OAuth drive-flow added in this branch by
giving embedders a second, session-less way to deliver the deeplink
payload to the runtime.

Adds a new HTTP route:

  POST /api/mcp-oauth/callback?state=<opaque>
                              [&code=<opaque>]
                              [&error=<rfc6749 code>]
                              [&error_description=<text>]

The parameters mirror what an authorization server sends to a
redirect URI per RFC 6749 §4.1.2. The body is empty; everything
travels in the query string.

A process-wide pending-OAuth registry maps state -> waiting
unmanaged flow. handleUnmanagedOAuthFlow registers its waiter on
entry (drive-flow only) and unregisters on exit. The new route
looks up the waiter by state and hands it the payload through a
single-shot buffered channel. The waiting goroutine then performs
the token exchange and persists the token exactly as before.

The lookup-by-state IS the authentication: only states the runtime
has just generated are present in the registry. Unknown or replayed
states return 404. State values continue to come from GenerateState
(>=128 bits, opaque, base64).

Inside the flow, the elicitation result and the direct callback
race. Whichever arrives first wins; the other is cancelled cleanly
with no further side effects on the connected client.

Strictly additive to the existing session-keyed
POST /api/sessions/:id/elicitation path: both reply shapes
({access_token, ...} legacy and {code, state} drive-flow) still
work, and an embedder may use the elicitation path, the new
callback path, or both. The --mcp-oauth-redirect-uri flag, the
cagent/authorize_url and cagent/state elicitation meta keys, and
the runtime's redirect-URI parity check at /token are unchanged.

Why it matters for embedders:

  - No session context required. The embedder forwards an opaque
    {state, code} (or {state, error}) and is done. It never needs
    to know which session is waiting, or even that the runtime has
    sessions at all.
  - No state tracking required. All state -> waiter accounting
    lives where it belongs: inside the runtime.
  - No UI-lifecycle dependency. As long as the runtime is alive
    when the deeplink fires, the token is exchanged and stored. An
    embedder UI that died between authorize and callback is no
    longer fatal.
  - Cross-process deeplink handlers can be thin couriers: a
    system-wide URL-scheme handler or an OS launcher does not need
    a connection to the runtime's elicitation stream at all.

Plumbing:

  - pkg/tools/mcp/pending_oauth.go: new file, process-wide
    state -> chan registry, exported DeliverPendingOAuthCallback
    + ErrPendingOAuthNoWaiter for the server handler
  - pkg/tools/mcp/oauth.go: handleUnmanagedOAuthFlow registers a
    waiter, runs the elicitation in a goroutine scoped to a
    cancellable child context, and selects between elicitation
    result, direct callback, and the outer ctx
  - pkg/server/server.go: new mcpOAuthCallback handler at
    POST /api/mcp-oauth/callback

Signed-off-by: Trung Nguyen <trung.nguyen@docker.com>
clientConnector.Connect detaches the caller's ctx with
context.WithoutCancel before sending the MCP initialize handshake.
The rationale is correct: an MCP toolset's session must outlive any
single request -- otherwise the underlying SSE/stdio connection
tears down when the request that triggered toolset.Start returns,
and a successful OAuth flow gets killed the moment its triggering
request completes.

But that detachment also severs user-initiated cancellation, which
the OAuth flow inside Connect DOES need to observe. Concretely:
handleUnmanagedOAuthFlow blocks on a select waiting for an
elicitation reply or an out-of-band callback. If the embedder
aborts the in-progress agent run while the user is in the OAuth
dialog, the streamCtx the SessionManager derived from the request
ctx gets cancelled -- but every ctx in the chain BELOW Connect's
WithoutCancel boundary stays alive. The OAuth select then blocks
forever, the per-session streaming lock is never released, and the
next user message returns 409 Conflict from RunSession.ErrSessionBusy.

This commit preserves the connection-longevity invariant and
restores user-initiated cancellation by stashing the caller's
original ctx as a value on the detached ctx. The detached ctx
remains the primary driver of the connection's lifetime, and
operations that want to observe user-initiated cancellation can
opt in by reading the parent back out and selecting on its Done
channel alongside the local ctx.

Carrying ctx-as-value is unusual but appropriate here: the parent
is metadata about how the detached ctx was constructed, not data
the request is operating on. The helpers (withCancellableParent /
cancellableParentFromContext) live next to the conector that
creates the relationship so the rationale stays close to the code.

handleUnmanagedOAuthFlow now extends its select with a userCancelCh
case keyed on the parent's Done channel. When the parent is nil
(e.g. unit tests, CLI invocations that don't go through
clientConnector.Connect), userCancelCh stays nil and the select
silently never picks it -- preserving back-compat.

Plumbing:

  - pkg/tools/mcp/cancellable_parent.go: new helper file with
    withCancellableParent + cancellableParentFromContext,
    extensively commented because ctx-as-value is unusual.
  - pkg/tools/mcp/mcp.go: clientConnector.Connect captures the
    caller's ctx before WithoutCancel and attaches it via
    withCancellableParent.
  - pkg/tools/mcp/oauth.go: handleUnmanagedOAuthFlow extracts the
    parent and adds a fourth select case watching its Done channel;
    returns parent.Err() so the agent loop sees a cancellable
    error and the SessionManager goroutine's deferred Unlock fires.

Tests:

  - cancellable_parent_test.go: round-trip, absent-returns-nil,
    nil-parent-no-op, and the key invariant that the local ctx
    stays independent of the parent's cancellation (only opt-in
    callers see it).
  - oauth_test.go: TestUnmanagedOAuthFlow_DriveFlow_AbortsOnParent-
    CtxCancellation asserts the new branch fires under the exact
    boundary clientConnector.Connect sets up, returns the parent's
    ctx error, and skips the token exchange.

Signed-off-by: Trung Nguyen <trung.nguyen@docker.com>
@aheritier aheritier added area/mcp MCP protocol, MCP tool servers, integration area/security Authentication, authorization, secrets, vulnerabilities kind/fix PR fixes a bug (maps to fix: commit prefix) labels May 28, 2026
CI flags pre-existing lint issues in code introduced by c97124d
and a93ae4d (the drive-flow and callback-route commits this PR
is stacked on). Fix them here so this PR's CI is green; these
fixes are also present on extend-unmanaged-oauth-flow (docker#2896).
When that PR merges, this commit will be dropped on rebase.

- bodyclose: close response bodies in RoundTrip test goroutines
- gci/gofmt: reflow var block in pending_oauth.go
- testifylint error-is-as: use assert.ErrorIs instead of
  assert.True(errors.Is(...))
- unconvert: drop redundant tools.ElicitationAction(...) conversion
- unparam: drop unused []byte return from httpDoStatus helper

Signed-off-by: Trung Nguyen <trung.nguyen@docker.com>
@trungutt trungutt closed this May 28, 2026
@docker-agent
Copy link
Copy Markdown

PR Review Failed — The review agent encountered an error and could not complete the review. View logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/mcp MCP protocol, MCP tool servers, integration area/security Authentication, authorization, secrets, vulnerabilities kind/fix PR fixes a bug (maps to fix: commit prefix)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants