Skip to content

feat(supervise): max-live-workers concurrency cap + explicit worker metering opt-in#360

Merged
drewstone merged 1 commit into
mainfrom
fix/rt329-worker-cap-metering
Jun 22, 2026
Merged

feat(supervise): max-live-workers concurrency cap + explicit worker metering opt-in#360
drewstone merged 1 commit into
mainfrom
fix/rt329-worker-cap-metering

Conversation

@drewstone

Copy link
Copy Markdown
Contributor

Closes rt#329. Two equal-k fences on the supervisor spawn path; both default to the prior behavior so every existing supervise() caller is unchanged.

(a) max-live-workers concurrency cap

spawn_agent (src/mcp/tools/coordination.ts) gated only on the conserved budget pool — that bounds total work, not simultaneous work, so a driver could flood real infra with N concurrent boxes/sandboxes.

  • New CoordinationToolsOptions.maxLiveWorkers: counts the scope's non-terminal nodes (pending/acquiring/running) and fails closed with error: 'max-live-workers' before reserving from the pool when the cap is met.
  • Threaded end-to-end: supervise()supervisorAgent (both the router and sandbox arms) → driverAgent / serveCoordinationMcpcreateCoordinationTools.
  • Omit or <= 0 = no cap (prior behavior; the pool stays the only fence).

(b) Explicit worker metering opt-in

createWorktreeCliExecutor hardcoded budgetExempt: true — a silent equal-k hole.

  • Promoted to an explicit, documented WorktreeCliExecutorOptions.budgetExempt. It defaults to true because a coding-harness CLI surfaces no token/usd usage, so metering it would record a fabricated zero (the no-silent-zeros rule forbids that). Set false only for a harness that surfaces real usage worth metering.
  • The exemption is now a visible, overridable field with a one-line rationale instead of a buried constant — a new caller who needs metering opts in deliberately.

Verification (all green, in order)

  • pnpm run build (examples need dist) — ✓
  • pnpm run typecheck (project + examples) — ✓
  • pnpm test — 1047 passed, 1 skipped, 0 failed
  • pnpm run lint — ✓
  • pnpm run docs:check (regenerated docs/api, freshness OK) — ✓

New tests:

  • spawn_agent fails closed at the maxLiveWorkers cap WITHOUT touching the pool — admits to the cap, rejects the next without calling scope.spawn, frees a slot on settle, and the uncapped tools admit past the prior cap.
  • budgetExempt: false opts the leaf into metering + renamed is budgetExempt by default.

…etering opt-in

Two equal-k fences on the supervisor spawn path.

(a) maxLiveWorkers cap: spawn_agent counted only the conserved budget pool
    (total work), so a driver could flood real infra with N simultaneous
    boxes. Add a configurable cap on not-yet-settled workers, enforced at
    spawn BEFORE the pool reservation (fail closed: error 'max-live-workers').
    Threaded supervise() -> supervisorAgent (both arms) -> driverAgent /
    serveCoordinationMcp -> createCoordinationTools. Omit/<=0 = no cap, so
    existing callers are unchanged.

(b) Worker metering: createWorktreeCliExecutor hardcoded budgetExempt:true,
    a silent equal-k hole. Promote it to an explicit, documented
    WorktreeCliExecutorOptions.budgetExempt (defaults true — a harness CLI
    surfaces no usage, so metering it would record a fabricated zero the
    no-silent-zeros rule forbids). Set false to meter a real-usage harness.

Tests: cap fails closed at the limit without touching the pool and frees a
slot on settle; uncapped admits past it; budgetExempt:false flips metering.

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved PR — d30039ae

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-22T13:35:16Z

@drewstone drewstone merged commit ef3d132 into main Jun 22, 2026
1 check passed
@drewstone drewstone mentioned this pull request Jun 22, 2026
drewstone added a commit that referenced this pull request Jun 22, 2026
…provenance + artifact lifecycle (#363)

Bundles the build-phase PRs landed since 0.72.0:
- #360 max-live-workers concurrency cap + explicit worker metering opt-in
- #362 mounted-resource manifest + caller selection receipts on LoopResult
- #361 artifact registry + marginal-lift ablation (rt#267 phase 1, ./lifecycle)
- #359 preserve partial events on abort via typed SandboxRunAbortError

Bumps package.json to 0.73.0, pins docs/canonical-api.md to 0.73.0, and adds
decision-table rows for run provenance (result.provenance.mounts/selectionReceipts)
and the artifact-lifecycle ablation (measureMarginalLift / ArtifactRegistry, /lifecycle).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants