fix(oauth): narrow context.lock scope in async_auth_flow#2660
Draft
peisuke wants to merge 3 commits into
Draft
Conversation
…poll requests
Previously the entire `OAuthClientProvider.async_auth_flow` body ran under
`self.context.lock`, including the `yield request` that hands the request
off to httpx. For requests that complete quickly this is fine, but a GET
SSE long-poll holds the lock for the full SSE lifetime — which means any
concurrent POST (e.g. `tools/call`) is blocked waiting for the lock,
producing a ~16s perceived stall on lazy MCP connections that use OAuth.
This commit splits the single coarse lock into purpose-specific scopes:
Phase 1 (context.lock): initialize state, capture protocol_version, and
decide whether a refresh is needed. Short-held; no HTTP I/O.
Phase 2 (refresh_lock, new): single-flight token refresh. The refresh
request `yield` happens outside any lock. A double-check inside
`context.lock` ensures concurrent waiters do not redundantly refresh
after another coroutine completed one.
Phase 3 (no lock): add the auth header and yield the actual request.
GET SSE long-polls and other long-running requests no longer block
unrelated traffic.
Phase 4 (context.lock): 401 / 403 full OAuth re-auth path. Conservatively
kept under lock because this path is rare and its yielded sub-requests
(metadata discovery, registration, token exchange) hit the AS, not the
resource server. A future refactor can narrow this further.
Lock additions:
- `OAuthContext.refresh_lock: anyio.Lock` provides single-flight refresh
so concurrent requests trigger at most one token refresh.
Behavior changes:
- Concurrent requests through the same `OAuthClientProvider` no longer
serialize at the lock. GET SSE long-polls and POSTs now proceed in
parallel.
- Token refresh remains serialized (via `refresh_lock`), preserving the
invariant that only one refresh request is in flight at a time.
- Public API and behavior are otherwise unchanged.
Related upstream issue:
modelcontextprotocol#1326
…ests Two tests in a new TestConcurrentRequestsDoNotDeadlock class exercise the behavior the previous commit fixes: 1. ``test_concurrent_request_not_blocked_by_pending_long_running_request`` drives one async_auth_flow generator to its yield (= simulating a GET SSE long-poll suspended waiting for the next event) and then opens a second concurrent flow on the same provider. The second flow must reach its own yield within a short timeout — i.e., the lock release between Phase 1 and Phase 3 lets it through. Pre-fix, the second generator would block on context.lock indefinitely. 2. ``test_concurrent_token_refresh_is_single_flight`` exercises the refresh_lock single-flight path. A first flow performs the refresh yield; a second flow started after the refresh completes observes the freshly-updated token in Phase 1 and proceeds directly to its own request yield without issuing a second refresh. Also: tighten the refresh_request unbound-after-conditional-write pattern in async_auth_flow so pyright recognizes it as definitely assigned at the yield site (was: derived from a boolean predicate; now: typed as ``httpx.Request | None`` and checked explicitly).
After dropping the function-level `# pragma: no cover` from `_handle_refresh_response` and removing the per-line pragmas from the refactored Phase 2, the strict-no-cover audit identified covered lines still marked pragma'd and surfaced previously-untested branches. Three new tests close the coverage gaps: * `test_refresh_with_failed_status_clears_tokens` — exercises the ``response.status_code != 200`` branch in `_handle_refresh_response` and the `self._initialized = False` reset on refresh failure. * `test_refresh_with_invalid_json_clears_tokens` — exercises the ValidationError branch when the refresh body is not valid JSON. * `test_double_check_inside_refresh_lock_skips_second_refresh` — uses monkeypatch to flip `is_token_valid` between Phase 1 (False) and the double-check inside `refresh_lock` (True), exercising the branch where a second coroutine's refresh is correctly elided. Also: convert the new tests from the legacy Test* class pattern to plain top-level `test_*` functions per AGENTS.md, and drop unneeded per-line `# pragma: no cover` markers in the refactored auth_flow. Coverage report: 100.00% on `src/mcp/client/auth/oauth2.py`, strict-no-cover clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Narrows the lock scope inside
OAuthClientProvider.async_auth_flowso the actual HTTP requestyieldruns outside any lock. Concurrent requests through the same provider — in particular, a POST issued while a GET SSE long-poll is in flight — no longer serialize oncontext.lock. A newrefresh_lockpreserves single-flight semantics for token refresh so the change does not introduce a refresh race.Motivation and Context
OAuthClientProvider.async_auth_flowpreviously heldself.context.lockfor the entire flow body, including theresponse = yield requestthat hands the request off to httpx. For requests that complete quickly this is fine, but a Streamable HTTP transport opens a long-lived GET SSE stream for server-initiated messages alongside the POST stream for client-initiated RPCs. Both go through the sameOAuthClientProviderinstance, so the GET SSE long-poll'syield requestholdscontext.lockfor the entire SSE lifetime. Any concurrenttools/callPOST then blocks on the lock until the SSE returns, producing a multi-second stall in OAuth-authenticated MCP connections.This commit splits the single coarse lock into purpose-specific scopes:
context.lock): initialize state, captureprotocol_version, and decide whether a refresh is needed. Short-held; no HTTP I/O.refresh_lock, new): single-flight token refresh. The refreshyieldhappens outside any lock. A double-check insidecontext.lockensures concurrent waiters do not redundantly refresh after another coroutine completed one.context.lock): 401 / 403 full OAuth re-auth path. Conservatively kept under lock because this path is rare and its yielded sub-requests target the AS, not the resource server. A future change can narrow this further with the same pattern.refresh_lockis added toOAuthContext(alongside the existinglock) so concurrent expirations trigger at most one refresh request. The double-check pattern insiderefresh_lockmakes the second waiter observe the freshly-updated token and skip its own refresh.How Has This Been Tested?
Five new tests in
tests/client/test_auth.pyexercise the refactor:test_concurrent_request_not_blocked_by_pending_long_running_request— the core regression test. Drives oneasync_auth_flowgenerator to its yield (simulating a GET SSE long-poll), then opens a second flow on the same provider and asserts it reaches its own yield withinanyio.fail_after(5). Pre-fix this hangs indefinitely.test_refresh_lock_double_check_skips_redundant_refresh— a flow started after another finished its refresh observes the fresh token in Phase 1 and skips Phase 2 entirely; no second refresh request is sent.test_double_check_inside_refresh_lock_skips_second_refresh— monkeypatchesis_token_validto return False in Phase 1 and True in the Phase 2 double-check, exercising the branch where a second coroutine's refresh is correctly elided insiderefresh_lock.test_refresh_with_failed_status_clears_tokens— covers the failed-refresh path (status_code != 200) including the_initialized = Falsereset.test_refresh_with_invalid_json_clears_tokens— covers theValidationErrorbranch when the refresh response body is malformed../scripts/testis clean: 1177 passed / 98 skipped / 1 xfailed, 100.00% coverage onsrc/mcp/client/auth/oauth2.py,strict-no-coverreports no leftover# pragma: no coverviolations. Four previously-needed pragmas in_handle_refresh_responseand the Phase 2 yield are now covered by the new tests and removed.Breaking Changes
None. Public API is unchanged. Concurrency semantics narrow only — previously-serialized requests now proceed in parallel; token refresh remains serialized via
refresh_lock.Types of changes
Checklist
Additional context
Targets
main(v2 development). The same bug exists onv1.x; if a backport is wanted, that can land as a follow-up PR.Fixes #1326