Add controller error-path tests, channel error reporting, and document OpenAI resilience config by Copilot · Pull Request #262 · dev-chat/mocker

Copilot · 2026-06-23T15:57:20Z

The app lacked test coverage proving OpenAI failures don't crash the /image and /prompt-with-history endpoints, AI errors were not being reported to the #muzzlefeedback channel for non-429 failures, and the resilience configuration env vars were undocumented.

Changes

Error Reporting (`ai.controller.ts`)

All three AI endpoints (/text, /image, /prompt-with-history) now post to #muzzlefeedback whenever the underlying service rejects, ensuring every AI failure is visible in the error channel — not just 429 rate-limit errors

Tests (`ai.controller.spec.ts`)

Added sendMessage to the WebService mock
Updated error-path tests for all three endpoints to assert both the ephemeral user message and the #muzzlefeedback channel notification are sent on failure
Added error-path tests for /image and /prompt-with-history — each verifies the endpoint still returns 200 when the underlying service rejects

Documentation (`README.md`)

Added all 8 OpenAI resilience env vars to the .env example block with inline defaults
Added an OpenAI Error Handling subsection under AI Features describing retry/circuit-breaker/timeout/bulkhead behavior and a full config-variable reference table:

Variable	Default	Description
`OPENAI_TIMEOUT_MS`	`10000`	Per-request abort deadline
`OPENAI_RETRIES`	`3`	Max retries on 429/5xx/timeout
`OPENAI_BACKOFF_BASE_MS`	`500`	Exponential backoff base with full jitter
`CIRCUIT_BREAKER_FAILURES`	`5`	Consecutive failures to open the circuit
`CIRCUIT_BREAKER_WINDOW_MS`	`60000`	Open window before a probe is allowed
`CIRCUIT_BREAKER_PROBE_MS`	`30000`	Min interval between probes
`OPENAI_CONCURRENCY`	`10`	Max in-flight calls (bulkhead)
`FEATURE_FLAG_RESILIENT_OPENAI`	`true`	Bypass all resilience logic when `false`

Original prompt

Goal: Prevent the application from crashing when OpenAI API calls fail. Find all usages of OpenAI (official SDKs, direct fetch/http calls to api.openai.com, or wrappers referencing 'openai', 'OpenAI', 'ChatCompletion', 'Completions', 'chat.completions', 'gpt', etc.) and harden them so transient or permanent OpenAI errors do not crash the app.

Tasks for the coding agent:

Discover OpenAI integration points

Search the repo for references to common OpenAI call patterns (Node.js 'openai' package, python 'openai', fetch/XHR to api.openai.com, axios/fetch wrappers, and any custom client code).

Add robust error handling

Wrap all OpenAI API calls with try/catch (or Promise.catch) so thrown exceptions or rejected promises never bubble up uncaught and crash the process.

For server-side code, ensure errors are translated into well-formed HTTP responses (e.g., 4xx/5xx) with a sanitized, user-friendly error message and an internal log that contains the full error for debugging.

For client-side code (if present), show a non-blocking UI error/toast rather than letting the app throw.

Implement retry for transient errors

For transient errors (HTTP 429, 502, 503, 504, or network timeouts), add exponential backoff retry (configurable max retries, default 2-3 retries).

Do not retry for 4xx errors indicating bad requests or invalid authentication.

Logging and observability

Add or reuse existing logging to record OpenAI errors with error class/type, status code, and request id (if present in response headers). Keep sensitive data out of logs (no API keys or user content).

Tests

Add unit tests or integration tests that simulate OpenAI errors (mock the OpenAI client or HTTP responses) and verify: (a) the app does not crash, (b) proper error response or UI message is produced, (c) retry behavior for transient errors.

Documentation

Update README or relevant docs with a short note about the new error handling and configuration (env vars for max retries, backoff base, etc.).

Acceptance criteria:

Running the app and inducing OpenAI failures (rate limit or 500-level errors) should not crash the process.

Errors should be logged for debugging and surfaced to callers/UI as a friendly message.

A PR is opened with the changes, clear commit messages, and tests demonstrating the fix.

Notes and constraints for the agent:

Do not change API keys or secrets. Use config/environment variables for retry limits/backoff settings.

Keep changes minimal and targeted to OpenAI call sites. If multiple languages/frameworks are present, prioritize the primary server-side language found in the repository.

If anything is ambiguous (e.g., which branch to base the PR on), open the PR against the repository default branch.

Make the code idiomatic for the project's language and follow existing style patterns.

Please create a PR with these changes and include a concise PR description summarizing the problem and the fix. Do not assign reviewers or DRI in the PR text.

This pull request was created from Copilot chat.

sfreeman422 · 2026-06-23T15:59:45Z

@copilot i am only concerned with two things: not crashing the app and reporting openAI errors to <#CKCL8UMJA> channel

… config

sfreeman422 · 2026-06-23T16:08:16Z

@copilot this does not achieve the goal - ensure that errors from openai result in a slack post to <#CKCL8UMJA>, and that errors are not thrown/uncaught in such a way that the service crashes

…andlers

Copilot

Pull request overview

Adds missing error-path coverage for AI controller endpoints to ensure OpenAI/service failures don’t crash Slack slash-command handlers, and documents the resilience-related environment variables/behavior for the OpenAI wrapper.

Changes:

Added controller tests verifying /image and /prompt-with-history still return 200 and send an ephemeral message when the underlying service rejects.
Tightened the existing /text error-path test to assert sendEphemeral arguments.
Expanded README with the OpenAI resilience env vars and an “OpenAI Error Handling” subsection describing retry/circuit-breaker/timeout/bulkhead behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
README.md	Documents OpenAI resilience env vars and explains error-handling behavior.
packages/backend/src/ai/ai.controller.spec.ts	Adds/strengthens controller-level tests for error paths across AI endpoints.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-06-23T16:14:47Z

@copilot i am only concerned with two things: not crashing the app and reporting openAI errors to <#CKCL8UMJA> channel

Done. The /text, /image, and /prompt-with-history error handlers in ai.controller.ts now each call webService.sendMessage('#muzzlefeedback', ...) alongside the ephemeral user message — so all AI failures are reported to that channel. Tests updated to assert both calls. (commit Report all AI errors to #muzzlefeedback channel in controller error handlers)

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

+    void webService.sendMessage(
+      '#muzzlefeedback',
+      `AI /image error for <@${user_id}>: ${e instanceof Error ? e.message : String(e)}`,
+    );


+    void webService.sendMessage(
+      '#muzzlefeedback',
+      `AI /prompt-with-history error for <@${request.user_id}>: ${e instanceof Error ? e.message : String(e)}`,
+    );


+| **Timeout**             | Each individual request is aborted after `OPENAI_TIMEOUT_MS` milliseconds.                                                                                          |
+| **Concurrency limiter** | At most `OPENAI_CONCURRENCY` calls are in-flight at once; excess requests are rejected immediately.                                                                 |
+| **Prometheus metrics**  | Counters and histograms for requests, retries, failures, circuit opens, and latency are registered via `prom-client` (expose the registry on a `/metrics` endpoint if desired). |
+| **Slack alert on 429**  | When the rate-limit error reaches the application layer, an alert is posted to `#muzzlefeedback`.                                                                   |


    await Promise.resolve();
-    expect(sendEphemeral).toHaveBeenCalled();
+    expect(sendEphemeral).toHaveBeenCalledWith('C1', expect.stringContaining('hello'), 'U1');
+    expect(sendMessage).toHaveBeenCalledWith('#muzzlefeedback', expect.stringContaining('boom'));


+
+    await Promise.resolve();
+    expect(sendEphemeral).toHaveBeenCalledWith('C1', expect.stringContaining('draw a cat'), 'U1');
+    expect(sendMessage).toHaveBeenCalledWith('#muzzlefeedback', expect.stringContaining('image boom'));


+
+    await Promise.resolve();
+    expect(sendEphemeral).toHaveBeenCalledWith('C1', expect.stringContaining('summarize'), 'U1');
+    expect(sendMessage).toHaveBeenCalledWith('#muzzlefeedback', expect.stringContaining('history boom'));


+    void webService.sendMessage(
+      '#muzzlefeedback',
+      `AI /text error for <@${user_id}>: ${e instanceof Error ? e.message : String(e)}`,
+    );


Initial plan

e57c64f

Copilot AI assigned Copilot and sfreeman422 Jun 23, 2026

Copilot started work on behalf of sfreeman422 June 23, 2026 15:57 View session

Add controller error-path tests and README docs for OpenAI resilience…

2f08b43

… config

Copilot AI changed the title ~~[WIP] Add robust error handling for OpenAI API calls~~ Add controller error-path tests and document OpenAI resilience config Jun 23, 2026

Copilot finished work on behalf of sfreeman422 June 23, 2026 16:06

Copilot AI requested a review from sfreeman422 June 23, 2026 16:06

Copilot started work on behalf of sfreeman422 June 23, 2026 16:07 View session

sfreeman422 marked this pull request as ready for review June 23, 2026 16:11

Copilot AI review requested due to automatic review settings June 23, 2026 16:11

Copilot started reviewing on behalf of sfreeman422 June 23, 2026 16:11 View session

Report all AI errors to #muzzlefeedback channel in controller error h…

d096c85

…andlers

Copilot AI reviewed Jun 23, 2026

View reviewed changes

Comment thread README.md Outdated

Potential fix for pull request finding

4193567

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot AI changed the title ~~Add controller error-path tests and document OpenAI resilience config~~ Add controller error-path tests, channel error reporting, and document OpenAI resilience config Jun 23, 2026

sfreeman422 requested a review from Copilot June 23, 2026 16:15

Copilot finished work on behalf of sfreeman422 June 23, 2026 16:15

Copilot started work on behalf of sfreeman422 June 23, 2026 16:15 View session

sfreeman422 closed this Jun 23, 2026

Copilot started reviewing on behalf of sfreeman422 June 23, 2026 16:16 View session

Copilot AI reviewed Jun 23, 2026

View reviewed changes

Copilot stopped work on behalf of sfreeman422 due to an error June 23, 2026 16:35
Authorization error, you may need to run /login (Request ID: 0C02:206EFC:67F3268:6E87444:6A3AB5B8)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add controller error-path tests, channel error reporting, and document OpenAI resilience config#262

Add controller error-path tests, channel error reporting, and document OpenAI resilience config#262
Copilot wants to merge 4 commits into
masterfrom
copilot/harden-openai-api-error-handling

Copilot AI commented Jun 23, 2026 •

edited

Loading

Uh oh!

sfreeman422 commented Jun 23, 2026

Uh oh!

sfreeman422 commented Jun 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI commented Jun 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Copilot AI commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Error Reporting (ai.controller.ts)

Tests (ai.controller.spec.ts)

Documentation (README.md)

Uh oh!

sfreeman422 commented Jun 23, 2026

Uh oh!

sfreeman422 commented Jun 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI commented Jun 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Jun 23, 2026 •

edited

Loading

Error Reporting (`ai.controller.ts`)

Tests (`ai.controller.spec.ts`)

Documentation (`README.md`)