fix: retry token refresh once on rate limit#39
Conversation
Multi-pod stateless deployments without session affinity can race on token refresh — multiple pods read the same expired cookie and call authenticateWithRefreshToken simultaneously. WorkOS accepts reused refresh tokens within a grace window (~10-30s), but rate limiting can still reject some of these concurrent calls. Catch RateLimitExceededException inside the existing dedup promise and retry once after the Retry-After delay (clamped 1-10s, default 1s). All concurrent in-process waiters share the single retry result.
Greptile SummaryThis PR adds a one-time retry on
Confidence Score: 5/5Safe to merge — the retry path is well-isolated inside the existing dedup promise, delay clamping is correct, and the full error cause chain is preserved and test-verified. The change is narrowly scoped: one new catch branch with a single retry and a bounded delay. The in-flight dedup map correctly keeps the promise alive through the wait period so no duplicate retries can race within a process. Eight dedicated tests—including fake-timer assertions on the exact delay boundaries and a cause-chain inspection—give high confidence the behavior is correct under all exercised edge cases. No files require special attention. Important Files Changed
|
Validate retryAfter is finite and positive, cap at 10s to prevent holding the dedup entry too long. Default to 1s for null/non-finite values. Add tests for edge cases: large values, Infinity, and non-rate-limit errors on retry. Run oxfmt.
When the retry also fails, attach the original RateLimitExceededException as the cause of retryError so the full sequence is visible: TokenRefreshError → retryError → RateLimitExceededException (original). Addresses Greptile review feedback on PR #39.
Summary
RateLimitExceededExceptioninrefreshTokens()and retries once after honoring theRetry-AfterdelayretryAfteris null or non-finite)Context: Customer reported
RateLimitExceededExceptionduring rolling deploys. Multiple pods read the same expired cookie and callauthenticateWithRefreshTokensimultaneously. The existing in-flight dedup (PR #33) handles within a single process, but cross-pod races still trigger rate limiting. WorkOS accepts reused refresh tokens within a ~10-30s grace window, so a single retry after the rate limit delay succeeds.Test plan
pnpm buildpassesretryAfterhonored, null fallback, double rate limit, 10s cap, non-finite values, non-rate-limit retry error, concurrent dedup sharing