forked from sfreeman422/mocker
-
Notifications
You must be signed in to change notification settings - Fork 3
feat: resilient OpenAI client — timeout, retries, circuit breaker, bulkhead, metrics #261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
sfreeman422
merged 7 commits into
master
from
copilot/implement-resilient-openai-client
Jun 23, 2026
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
b927549
Initial plan
Copilot da1aed9
feat: add resilient OpenAI client with timeout, retries, circuit brea…
Copilot 894b1c4
refactor: address code review feedback on resilient OpenAI client
Copilot 8e43bdb
Potential fix for pull request finding
sfreeman422 750753e
Potential fix for pull request finding
sfreeman422 ebbdb86
Potential fix for pull request finding
sfreeman422 6f0c6ea
fix: resolve CI lint and format failures
Copilot File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,212 @@ | ||
| # Resilient OpenAI Client | ||
|
|
||
| This document describes the resilient OpenAI wrapper introduced in | ||
| `packages/backend/src/lib/resilientOpenAIClient.ts`. The wrapper prevents | ||
| transient or sustained OpenAI failures from cascading and taking down the | ||
| broader Moonbeam service. | ||
|
|
||
| --- | ||
|
|
||
| ## Overview | ||
|
|
||
| `ResilientOpenAIClient` wraps the official `openai` SDK client and adds: | ||
|
|
||
| | Feature | Default | | ||
| | -------------------------------------------------------- | ---------------------------------- | | ||
| | Per-request timeout | 10 s | | ||
| | Automatic retries with exponential backoff + full jitter | 3 retries | | ||
| | `Retry-After` header honored on 429 responses | yes | | ||
| | Circuit breaker | opens after 5 consecutive failures | | ||
| | Concurrency / bulkhead limiter | 10 concurrent calls | | ||
| | Graceful degradation via `ResilientOpenAIError` | yes | | ||
| | Structured logging (Winston) | yes | | ||
| | Prometheus metrics (`prom-client`) | yes | | ||
|
|
||
| The wrapper exposes the same narrow `responses.create` surface used by | ||
| `AIService`, so the change is a drop-in replacement. | ||
|
|
||
| --- | ||
|
|
||
| ## Feature flag | ||
|
|
||
| Set the environment variable `FEATURE_FLAG_RESILIENT_OPENAI` to `false` to | ||
| bypass all resilience logic and delegate directly to the underlying OpenAI SDK | ||
| client. This is the rollback switch. | ||
|
|
||
| ``` | ||
| FEATURE_FLAG_RESILIENT_OPENAI=false # bypass resilience (rollback) | ||
| FEATURE_FLAG_RESILIENT_OPENAI=true # enable resilience (default) | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Environment variables | ||
|
|
||
| All variables are optional. Defaults are shown. | ||
|
|
||
| | Variable | Default | Description | | ||
| | ------------------------------- | ------- | ------------------------------------------------------------- | | ||
| | `FEATURE_FLAG_RESILIENT_OPENAI` | `true` | Set to `false` to bypass the wrapper entirely. | | ||
| | `OPENAI_TIMEOUT_MS` | `10000` | Maximum ms to wait for a single request before aborting. | | ||
| | `OPENAI_RETRIES` | `3` | Maximum retry attempts on transient errors. | | ||
| | `OPENAI_BACKOFF_BASE_MS` | `500` | Base interval (ms) for exponential backoff with full jitter. | | ||
| | `CIRCUIT_BREAKER_FAILURES` | `5` | Consecutive failures needed to open the circuit. | | ||
| | `CIRCUIT_BREAKER_WINDOW_MS` | `60000` | Duration (ms) the circuit stays open before allowing a probe. | | ||
| | `CIRCUIT_BREAKER_PROBE_MS` | `30000` | Minimum interval (ms) between probe attempts while open. | | ||
| | `OPENAI_CONCURRENCY` | `10` | Maximum concurrent outbound OpenAI calls per instance. | | ||
|
|
||
| Configuration is loaded by `packages/backend/src/config/openai.ts` which reads | ||
| these variables at instantiation time with sensible defaults. | ||
|
|
||
| --- | ||
|
|
||
| ## Retry behaviour | ||
|
|
||
| Requests are retried when the error is classified as _retriable_: | ||
|
|
||
| - HTTP 429 (rate limit) — also extracts `Retry-After` header and waits | ||
| accordingly before retrying. | ||
| - HTTP 5xx (server errors). | ||
| - Network / connection errors (`ECONNRESET`, `ETIMEDOUT`, socket hang-up, | ||
| `fetch failed`, etc.). | ||
| - `ResilientOpenAIError` with code `TIMEOUT`. | ||
|
|
||
| Non-retriable errors (4xx other than 429, business-logic errors) are surfaced | ||
| immediately. | ||
|
|
||
| Backoff is computed as: | ||
|
|
||
| ``` | ||
| sleep = random(0, backoffBaseMs * 2^attempt) # full jitter | ||
| ``` | ||
|
|
||
| If a `Retry-After` header is present on a 429 response, that duration (in | ||
| seconds) overrides the computed backoff. | ||
|
|
||
| --- | ||
|
|
||
| ## Circuit-breaker states | ||
|
|
||
| ``` | ||
| CLOSED ──(N consecutive failures)──► OPEN | ||
| ▲ │ | ||
| │ (probe succeeds) (window + probe interval elapsed) | ||
| └──────── HALF-OPEN ◄────────────────┘ | ||
| │ | ||
| (probe fails) | ||
| │ | ||
| OPEN | ||
| ``` | ||
|
|
||
| - **CLOSED**: normal operation. | ||
| - **OPEN**: calls are short-circuited immediately with | ||
| `ResilientOpenAIError(code: 'CIRCUIT_OPEN')`. No requests reach OpenAI. | ||
| - **HALF-OPEN**: one probe call is allowed through. If it succeeds the circuit | ||
| closes; if it fails the circuit re-opens. | ||
|
|
||
| The current state is observable via `client.getCircuitState()`. | ||
|
|
||
| --- | ||
|
|
||
| ## Concurrency limiter | ||
|
|
||
| At most `OPENAI_CONCURRENCY` requests may be in-flight simultaneously. | ||
| Additional requests are rejected immediately with | ||
| `ResilientOpenAIError(code: 'CONCURRENCY_REJECTED')`. | ||
|
|
||
| `client.getActiveRequests()` returns the current in-flight count. | ||
|
|
||
| --- | ||
|
|
||
| ## Error types | ||
|
|
||
| ```typescript | ||
| import { ResilientOpenAIError } from '../lib/resilientOpenAIClient'; | ||
|
|
||
| try { | ||
| await openAi.responses.create({ ... }); | ||
| } catch (err) { | ||
| if (err instanceof ResilientOpenAIError) { | ||
| // err.code is one of: | ||
| // 'CIRCUIT_OPEN' – circuit is open, request was not sent | ||
| // 'TIMEOUT' – request exceeded OPENAI_TIMEOUT_MS | ||
| // 'CONCURRENCY_REJECTED'– too many concurrent requests | ||
| // 'MAX_RETRIES_EXCEEDED'– reserved for future use | ||
| handleDegradedMode(err.code); | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| Upstream code (e.g. `AIService`) can check `err instanceof ResilientOpenAIError` | ||
| to distinguish transient/infrastructure failures from application errors and | ||
| return appropriate degraded UX to users. | ||
|
|
||
| --- | ||
|
|
||
| ## Metrics | ||
|
|
||
| The client registers the following Prometheus metrics via `prom-client`. Each | ||
| `ResilientOpenAIClient` instance uses its own `Registry` so multiple instances | ||
| in the same process do not collide. Expose the registry on a `/metrics` | ||
| endpoint if Prometheus scraping is desired. | ||
|
|
||
| | Metric | Type | Description | | ||
| | --------------------------- | --------- | ------------------------------------------------------------------------------------------------------------------ | | ||
| | `openai_requests_total` | Counter | Total requests attempted, labelled `status` (`attempted`, `success`, `circuit_open`, `concurrency_rejected`). | | ||
| | `openai_retries_total` | Counter | Total retry attempts. | | ||
| | `openai_failures_total` | Counter | Total failed requests after all retries, labelled `reason` (`error`, `circuit_open`, `concurrency_rejected`). | | ||
| | `openai_circuit_open_total` | Counter | Number of times the circuit transitioned to open. | | ||
| | `openai_latency_seconds` | Histogram | End-to-end request latency in seconds, labelled `status` (`success`, `error`). Buckets: 0.1, 0.5, 1, 2, 5, 10, 30. | | ||
|
|
||
| --- | ||
|
|
||
| ## Logging | ||
|
|
||
| The wrapper emits structured Winston logs (child logger `ResilientOpenAIClient`) | ||
| for: | ||
|
|
||
| - Per-retry failures (level `info`). | ||
| - Non-retriable errors (level `warn`). | ||
| - Timeout aborts (level `warn`). | ||
| - Circuit-breaker state transitions (level `info` / `warn`). | ||
| - Concurrency limit reached (level `warn`). | ||
|
|
||
| --- | ||
|
|
||
| ## Rollout / migration plan | ||
|
|
||
| ### Phase 1 — Behind feature flag (current state) | ||
|
|
||
| The wrapper is **enabled by default** (`FEATURE_FLAG_RESILIENT_OPENAI=true`). | ||
|
|
||
| 1. **Deploy to staging** — verify smoke tests pass; inspect logs and metrics. | ||
| 2. **Enable in a canary production host** — monitor `openai_failures_total`, | ||
| `openai_circuit_open_total`, and p99 latency via `openai_latency_seconds`. | ||
| 3. **Gradually roll out** across all production instances while monitoring the | ||
| metrics above and Sentry/Datadog error rates. | ||
|
|
||
| ### Phase 2 — Stabilisation | ||
|
|
||
| Once the canary shows stable behaviour for 24–48 h: | ||
|
|
||
| - Enable for all production instances. | ||
| - Set alerting on `openai_circuit_open_total > 0` and high | ||
| `openai_failures_total` rates. | ||
|
|
||
| ### Rollback | ||
|
|
||
| If issues occur at any phase, flip the feature flag: | ||
|
|
||
| ``` | ||
| FEATURE_FLAG_RESILIENT_OPENAI=false | ||
| ``` | ||
|
|
||
| Restart the service. The wrapper delegates directly to the underlying OpenAI | ||
| SDK; no other code changes are required. | ||
|
|
||
| ### Phase 3 — Cleanup (future) | ||
|
|
||
| After the feature has been stable in production for a sprint: | ||
|
|
||
| - Remove the `featureFlagResilient` branch and env-var check. | ||
| - Remove the `FEATURE_FLAG_RESILIENT_OPENAI` documentation references. |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| /** | ||
| * Configuration for the resilient OpenAI client. | ||
| * All values are read from environment variables with sensible defaults. | ||
| */ | ||
|
|
||
| export interface OpenAIClientConfig { | ||
| /** Maximum ms to wait for a single OpenAI request before aborting. */ | ||
| timeoutMs: number; | ||
| /** Maximum number of retry attempts on transient errors. */ | ||
| retries: number; | ||
| /** Base backoff interval (ms) for exponential backoff with full jitter. */ | ||
| backoffBaseMs: number; | ||
| /** Number of consecutive failures required to open the circuit breaker. */ | ||
| circuitBreakerFailures: number; | ||
| /** Duration (ms) the circuit stays open before allowing a probe. */ | ||
| circuitBreakerWindowMs: number; | ||
| /** Minimum interval (ms) between probe attempts while the circuit is open. */ | ||
| circuitBreakerProbeMs: number; | ||
| /** Maximum number of concurrent outbound OpenAI calls. */ | ||
| concurrency: number; | ||
| /** | ||
| * When true (default), requests are wrapped with timeouts, retries, | ||
| * the circuit breaker, and the concurrency limiter. | ||
| * Set FEATURE_FLAG_RESILIENT_OPENAI=false to bypass all resilience logic | ||
| * and delegate directly to the underlying OpenAI client. | ||
| */ | ||
| featureFlagResilient: boolean; | ||
| } | ||
|
|
||
| const parseIntWithDefault = (value: string | undefined, defaultValue: number): number => { | ||
| const parsed = parseInt(value ?? '', 10); | ||
| return Number.isFinite(parsed) ? parsed : defaultValue; | ||
| }; | ||
|
|
||
| export const getOpenAIClientConfig = (): OpenAIClientConfig => ({ | ||
| timeoutMs: Math.max(1, parseIntWithDefault(process.env.OPENAI_TIMEOUT_MS, 10_000)), | ||
| retries: Math.max(0, parseIntWithDefault(process.env.OPENAI_RETRIES, 3)), | ||
| backoffBaseMs: Math.max(0, parseIntWithDefault(process.env.OPENAI_BACKOFF_BASE_MS, 500)), | ||
| circuitBreakerFailures: Math.max(1, parseIntWithDefault(process.env.CIRCUIT_BREAKER_FAILURES, 5)), | ||
| circuitBreakerWindowMs: Math.max(1, parseIntWithDefault(process.env.CIRCUIT_BREAKER_WINDOW_MS, 60_000)), | ||
| circuitBreakerProbeMs: Math.max(1, parseIntWithDefault(process.env.CIRCUIT_BREAKER_PROBE_MS, 30_000)), | ||
| concurrency: Math.max(1, parseIntWithDefault(process.env.OPENAI_CONCURRENCY, 10)), | ||
| featureFlagResilient: process.env.FEATURE_FLAG_RESILIENT_OPENAI !== 'false', | ||
| }); | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.