Skip to content

fix(sandbox-run): preserve partial events on abort via typed SandboxRunAbortError#359

Merged
drewstone merged 1 commit into
mainfrom
fix/sandbox-run-preserve-partial-events
Jun 22, 2026
Merged

fix(sandbox-run): preserve partial events on abort via typed SandboxRunAbortError#359
drewstone merged 1 commit into
mainfrom
fix/sandbox-run-preserve-partial-events

Conversation

@drewstone

Copy link
Copy Markdown
Contributor

Problem (rt#325)

settle() in src/runtime/sandbox-run.ts collected stream events into a local array. When the for-await drain threw on abort — or the pre-read / read-retry abort guards fired — that array was lost: start()/resume() rethrew the bare AbortError, which carried no partial state. An aborted or timed-out run was therefore undiagnosable — you couldn't tell never-started (no events) from looped (many events, no terminal result) from produced-nothing-then-cancelled.

Fix

Define and throw a typed SandboxRunAbortError that carries { events, readError? } at every abort point inside settle():

  • the stream-drain try/catch (the stream itself throws AbortError on cancel),
  • the pre-read guard,
  • the per-attempt guard inside the artifact read-retry loop (also carries the last readError).

name === 'AbortError', so the existing cross-kernel err.name === 'AbortError' matchers (run-loop, scope, supervise runtime) keep matching it unchanged — no behavior change for callers that only branch on abort-ness; the partial trace is now available to any caller that wants to diagnose. Exported from the runtime barrel alongside openSandboxRun/TurnResult.

Also adds a shared isAbortError() to src/runtime/util.ts next to the other abort primitives (abortError/throwAbort/throwIfAborted).

Tests

tests/runtime/sandbox-run.test.ts previously asserted the lossy behavior (only that an abort threw and no read happened). Replaced with assertions that the partial trace is preserved:

  • events drained before a pre-read abort ride out on the thrown error,
  • only the events emitted before a mid-stream abort are carried (never-started stays empty),
  • events plus the last readError are carried when the read-retry loop is aborted mid-retry.

The fake streamPrompt now honors the abort signal (models the real backend throwing AbortError on the next pull) and can emit a scripted event sequence.

Verification

  • pnpm run build — green (examples need dist)
  • pnpm run typecheck (incl. typecheck:examples) — clean
  • pnpm test — 1047 passed, 1 skipped (sandbox-run file: 16 tests)
  • pnpm run lint — clean (302 files)
  • merges cleanly into origin/main

tangletools
tangletools previously approved these changes Jun 22, 2026

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved PR — ba9845fd

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-22T13:33:02Z

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Value Audit — sound-with-nits

Verdict sound-with-nits
Concerns 2 (2 weak-concern)
Heuristic 0.0s
Duplication 0.0s
Interrogation 197.4s (2 bridge agents)
Total 197.4s

💰 Value — sound-with-nits

Preserves partial event traces on sandbox-run abort by throwing a typed SandboxRunAbortError carrying drained events and the last readError, while keeping err.name === 'AbortError' backward-compatible for existing callers; a shared isAbortError helper is added but not used to consolidate two existin

  • What it does: In src/runtime/sandbox-run.ts:205-247, the settle() function previously collected stream events into a local array and lost them on abort. Now it throws a new SandboxRunAbortError (src/runtime/sandbox-run.ts:76-87) that carries the partial events array and optional readError at every abort point: mid-stream drain catch, pre-read signal guard, and per-attempt artifact read-retry guard. The error's
  • Goals it achieves: Makes aborted/timed-out sandbox runs diagnosable: a caller can now distinguish never-started (events: []), looped-but-no-terminal-result (many events), and produced-then-cancelled cases. It also prevents silently reading a stale workspace artifact after cancel and preserves the last readError when abort interrupts the retry loop.
  • Assessment: Good change. The goal is coherent, the implementation is minimal and stays within the existing sandbox-run facade, and the backward-compatibility story (name === 'AbortError') is sound. Tests in tests/runtime/sandbox-run.test.ts:293-389 were updated to assert the new partial-trace behavior across the three abort paths.
  • Better / existing approach: none for the core feature — there is no existing typed abort error that carries partial sandbox events. However, the new shared isAbortError in src/runtime/util.ts duplicates local helpers that already exist at src/runtime/run-loop.ts:755 and src/runtime/supervise/scope.ts:776. run-loop.ts already imports from './util' and could trivially use the shared one; scope.ts has a slightly looser object-s
  • Model: opencode/kimi-for-coding/k2p7
  • Bridge attempts: 1

🎯 Usefulness — sound-with-nits

A backward-compatible typed abort error that carries partial events/readError out of settle() — fixes a real diagnosability gap at the producer seam; natural consumers exist and can adopt incrementally.

  • Integration: Strong. Exported from the runtime barrel next to openSandboxRun/TurnResult (src/runtime/index.ts:233), thrown at every abort site inside settle() (sandbox-run.ts:215 pre-read stream catch, :221 pre-read guard, :231 retry-loop guard), and preserves name==='AbortError' so the existing cross-kernel matchers in run-loop.ts:744,755-756 and supervise/scope.ts:631,776-783 keep matching unchanged. The new
  • Fit with existing patterns: Fits the codebase grain. The pattern — a typed Error subclass carrying diagnostic payload, with name kept as 'AbortError' for cross-kernel duck-typing — mirrors how ValidationError is used as a typed sentinel in run-loop.ts:752. The claim in the PR body about backward compatibility is verifiable: all three existing matchers check err.name==='AbortError' (or duck-type on .name only), and the new cl
  • Real-world viability: Solid on the abort paths it covers. All three conversion sites sit at well-defined await boundaries (for-await pull, pre-read guard, retry-loop guard) — no races. The try/catch at sandbox-run.ts:212-217 correctly re-throws non-abort stream errors unchanged and only converts AbortError-shaped throws. events-deliverable mid-stream abort, artifact-deliverable pre-read abort, and artifact-deliverable
  • Model: opencode/zai-coding-plan/glm-5.2
  • Bridge attempts: 1

💰 Value Audit

🟡 New shared isAbortError duplicates existing local helpers [duplication] ``

The PR adds isAbortError to src/runtime/util.ts:55-58, but identical/semantically-equivalent helpers already exist at src/runtime/run-loop.ts:755 (exact same implementation) and src/runtime/supervise/scope.ts:776 (object-shape variant). run-loop.ts already imports from './util' so the switch is one line; scope.ts would need a new import. Consolidating would prevent the helpers from drifting and justify adding the shared primitive.

🎯 Usefulness Audit

🟡 isAbortError now exists in three places — consolidate the two stale local copies [ergonomics] ``

This PR adds the canonical shared isAbortError at src/runtime/util.ts:55-58 (correct home, next to abortError/throwAbort/throwIfAborted). But two pre-existing local copies remain: run-loop.ts:755-756 and supervise/scope.ts:776-783 (the scope.ts copy is duck-typed without instanceof Error, slightly looser). The codebase now has three implementations of the same abort predicate. Not a blocker for this PR — but a follow-up should delete the two locals and import from util.ts so the contract stays s


What this audit checks

It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.

Pass What it asks
Heuristic Vague title? Whitespace-only or cruft-bearing diff? (content signals only)
Duplication Do added function/class names already exist elsewhere in the repo?
Value Audit What does it do? What goal does it achieve? Is it good? Better architecture or already-exists?
Usefulness Audit Does it integrate and fit? Will it hold up in real use and actually get used?

Findings are concerns, not blocks — the human reviewer decides what to do with them.

value-audit · 20260622T133808Z

@tangletools

Copy link
Copy Markdown
Contributor

✅ No Blockers — ba9845fd

Readiness 77/100 · Confidence 70/100 · 12 findings (1 medium, 11 low)

opencode-kimi glm deepseek aggregate
Readiness 79 77 86 77
Confidence 70 70 70 70
Correctness 79 77 86 77
Security 79 77 86 77
Testing 79 77 86 77
Architecture 79 77 86 77

Full multi-shot audit completed 2/2 planned shots over 4 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 2/2 planned shots over 4 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 2/2 planned shots over 4 changed files. Global verifier still owns final merge decision.

🟠 MEDIUM isAbortError ignores DOMException aborts already handled elsewhere — src/runtime/util.ts

Line 56-58: isAbortError returns true only for err instanceof Error && err.name === 'AbortError'. src/runtime/supervise/runtime.ts:382-383 and :956-957 explicitly detect e instanceof DOMException && e.name === 'AbortError', and tests/loops/inbox.test.ts:114 / tests/mcp/task-queue.test.ts:130 model DOMException aborts. DOMException is not an instanceof Error, so a DOMException-shaped abort thrown by a lower layer would fail the new isAbortError check in sandbox-run.ts:215 and the partial event trace would be lost, contradicting the PR's stated purpose. src/runtime/supervise/scope.ts:776-782 already has a more permissive implementation (`typeof

🟡 LOW Final-read-abort returns TurnResult instead of throwing SandboxRunAbortError — src/runtime/sandbox-run.ts

In the retry loop (lines 230-241), the abort guard runs ONLY at the start of each iteration. If the 4th box.fs.read() itself aborts synchronously (signal trips mid-call), the catch at line 236 records readError='aborted', attempt < readAttempts-1 is false so no sleep, and the loop falls through to return { out, events, readError } at line 242. The caller sees a successful TurnResult carrying a readE

🟡 LOW Partial events on abort are not surfaced into the runtime hook payload — src/runtime/sandbox-run.ts

The motivation for SandboxRunAbortError is diagnosability of aborted turns (per the class JSDoc lines 67-75). But when a turn throws it, the error-path hook emission at lines 299-315 / 342-358 builds its payload via turnPayload(...) which only captures error: errorMessage(error) = 'aborted' (line 200). The carried events and readError on the typed error never reach the runtime hook event, so a ho

🟡 LOW SandboxRunAbortError drops the original abort cause/stack — src/runtime/sandbox-run.ts

Lines 82-86 and 215: when an abort error is caught and wrapped, the original error is discarded instead of passed as { cause: err }. The partial events are preserved (the PR's goal), but the original stack and any SDK-specific abort metadata are lost, making post-mortem debugging harder when an abort originates from a lower layer. Fix: super('aborted', { cause: err }) in the constructor.

🟡 LOW SandboxRunAbortError loses DOMException identity vs supervise/runtime's instanceof check — src/runtime/sandbox-run.ts

The class JSDoc (lines 72-74) claims existing callers in 'the loop kernel, scope, supervise runtime' keep matching unchanged. True for the name-based matchers (run-loop.ts:744, scope.ts:631). But supervise/runtime.ts:381-385 ALSO gates on e instanceof DOMException && e.name === 'AbortError'. The SDK stream surfaces aborts as DOMException; wrapping it in a plain Error subclass (super('aborted') at sandbox-run.ts:83) drops DOMException identity, so that branch would miss the wrapped error if openSandboxRun is ever routed through the supervise runtime. No current consumer breaks (openSandboxRun today feeds only bench/src/{worker,fleet,cloud-loop,search-bench,co

🟡 LOW Duplicate isAbortError implementations — src/runtime/util.ts

A new isAbortError is added in util.ts while near-identical private helpers already exist in src/runtime/run-loop.ts:755-757 and a broader one in src/runtime/supervise/scope.ts:776-782. This is a missed opportunity to canonicalize the helper and reduce drift. Not blocking for this shot but worth a follow-up cleanup.

🟡 LOW isAbortError now duplicated in 3 files — src/runtime/util.ts

This PR exports a new isAbortError at util.ts:56-58, but two local copies remain: run-loop.ts:755-757 and supervise/scope.ts:776 (same body, err instanceof Error && err.name === 'AbortError'). The PR adds a third source of truth rather than consolidating. No behavioral divergence today (identical predicates), but it's exactly the kind of seam the PR is supposedly centralizing. Fix: delete the two local copies and import from ./util in both files. Trivial, low-risk follow-up.

🟡 LOW isAbortError pattern inconsistency with scope.ts — src/runtime/util.ts

util.ts:57 uses err instanceof Error && err.name === 'AbortError'. scope.ts:776-781 uses a more permissive check that doesn't require instanceof Error (handles plain objects with .name). This is pre-existing (run-loop.ts:755 mirrors util.ts's stricter pattern) and Node >=20 guarantees DOMException instanceof Error, so not a practical bug. But if a sandbox SDK or fetch polyfill throws a non-Error object with name='AbortError', the stream catch at sandbox-run.ts:214 would re-throw it raw instead of wrapping it in SandboxRunAbortError, losing partial events. Fix: align isAbortError implementations across the codebase in a follow-up.

🟡 LOW No explicit abort test coverage for resume() — tests/runtime/sandbox-run.test.ts

Lines 293-388: All three new abort tests only exercise run.start(). The resume() method calls the same settle() closure and is subject to identical abort behavior. While settle is not parameterized differently for resume, a future refactor could accidentally diverge the call paths without a test catching it. Add one test with start()resume() → abort-on-resume to close the gap.

🟡 LOW No test for abort before any events are yielded — tests/runtime/sandbox-run.test.ts

Lines 327-355: The mid-stream abort test covers 1-event-then-abort, but the scenario where the signal is already aborted before the stream generator's first yield is untested. In that case collected would be [] and a SandboxRunAbortError with empty events should still be thrown (diagnosable as 'never started' per the class JSDoc at sandbox-run.ts:70). The fake client's per-iteration signal check supports this but it's not exercised.

🟡 LOW Test 2 name overpromises 'never-started stays empty' but tests mid-stream abort — tests/runtime/sandbox-run.test.ts

Lines 327-355: the test title says 'carries ONLY the events drained before a mid-stream abort (never-started stays empty)' but the body aborts after the FIRST event drains (onEvent i===1), so it verifies the 1-event case, not the 0-event/never-started case. A pre-aborted signal (controller.abort() before run.start) with an empty carried-events array is never asserted. Impact: the 'never-started' branch of the diagnostic claim (events: [] distinguishes never-started from looped) is untested. Fix: either rename to drop the 'never-started' clause, or add a 4th test that pre-aborts and asserts err.events).toEqual([]).

🟡 LOW Tests 2 and 3 skip the err.name === 'AbortError' cross-kernel contract assertion — tests/runtime/sandbox-run.test.ts

Lines 353-354 and 383-385: these tests assert instanceof SandboxRunAbortError and .events/.readError but do NOT assert (err as Error).name === 'AbortError'. Test 1 (line 321) does. The class hardcodes override readonly name = 'AbortError' so it's correct today, but the contract that lets existing run-loop/scope matchers keep working unmodified is the PR's central compatibility claim — worth asserting in every abort test, not just one. Fix: add expect((err as Error).name).toBe('AbortError') to tests 2 and 3.


tangletools · 2026-06-22T13:42:53Z · trace

…unAbortError

settle() collected stream events into a local array that was lost when an
abort fired mid-turn — the bare AbortError rethrown by start()/resume()
carried no partial state, so an aborted/timed-out run was undiagnosable
(never-started vs looped vs produced-nothing).

Throw a typed SandboxRunAbortError carrying { events, readError? } at every
abort point in settle(): the stream-drain catch, the pre-read guard, and the
read-retry guard. name === 'AbortError' so the existing cross-kernel
err.name === 'AbortError' matchers (run-loop, scope, supervise runtime) keep
matching it unchanged. Add a shared isAbortError() to util alongside the
other abort primitives.

Tests now assert the partial trace is preserved: events drained before a
pre-read abort, only-events-before a mid-stream abort, and events + last
readError when the read-retry loop is aborted.
@drewstone drewstone force-pushed the fix/sandbox-run-preserve-partial-events branch from ba9845f to 1f2e149 Compare June 22, 2026 13:47

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved PR — 1f2e1490

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-22T13:47:56Z

@drewstone drewstone merged commit 7eb662c into main Jun 22, 2026
1 check passed
@drewstone drewstone mentioned this pull request Jun 22, 2026
drewstone added a commit that referenced this pull request Jun 22, 2026
…provenance + artifact lifecycle (#363)

Bundles the build-phase PRs landed since 0.72.0:
- #360 max-live-workers concurrency cap + explicit worker metering opt-in
- #362 mounted-resource manifest + caller selection receipts on LoopResult
- #361 artifact registry + marginal-lift ablation (rt#267 phase 1, ./lifecycle)
- #359 preserve partial events on abort via typed SandboxRunAbortError

Bumps package.json to 0.73.0, pins docs/canonical-api.md to 0.73.0, and adds
decision-table rows for run provenance (result.provenance.mounts/selectionReceipts)
and the artifact-lifecycle ablation (measureMarginalLift / ArtifactRegistry, /lifecycle).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants