feat(lifecycle): artifact-lifecycle loop generate→measure→promote→compose — closes #267 by drewstone · Pull Request #364 · tangle-network/agent-runtime

drewstone · 2026-06-22T17:31:52Z

What this closes

The artifact-lifecycle loop, end to end, on top of the phase-1 foundation (ArtifactRegistry + measureMarginalLift + applyArtifact). The binding problem: an empty profile has no skills, and nothing creates one, measures its value, and folds it back in a gated, provenance-tracked way. This wires that loop and proves it closes.

Plain-language frame

We make an agent self-improve a piece of its profile. The loop creates a candidate piece, measures how many extra problems it solves on a held-back exam (fresh problems it never tuned on), promotes it only if it clears that exam, stores it with the score as a receipt, and folds the winners back into the agent's profile.

What's new (wiring existing engines, not rebuilding them)

runLifecycle — the ONE surface-agnostic orchestrator: GENERATE (per-surface CandidateGenerator) → MEASURE each via measureMarginalLift on the held-back split → PROMOTE via a pluggable PromotionGate → STORE in ArtifactRegistry with provenance (domain, generation, generator kind, gate verdict) + the lift score.
CandidateGenerator (generator.ts) — the thin per-surface seam, the ONLY per-surface code. The interface the next stages implement.
PromotionGate (gate.ts) — thresholdPromotionGate (scalar lift) and heldOutPromotionGate, which delegates to agent-eval's HeldOutGate (paired-bootstrap CI on per-task holdout records). The held-out gate fails loud if the eval produced no per-task records — a significance claim with no data behind it is forbidden.
Registry invariant — promoteWithLift records the measured lift; liftOf returns it. An artifact is active IFF it carries a finite lift. composeProfile folds the top-k active artifacts ranked by lift back into a profile; a status flag without a lift receipt is invisible.
skillGenerator (skill-generator.ts) — DISTILL (create a skill from traces — the step skillOpt cannot do) then REFINE (optimize it). Both are injected seams (§1.5: author the profile, don't embed a loop). This is the literal answer to "empty profile has no skills".
lifecycles field on defineAgent — declarative per-surface config the loop reads (surface + generator + gate + compose-k).

The end-to-end proof (the keystone)

src/lifecycle/closed-loop.test.ts — deterministic, no live model:

EMPTY profile (no skills) → runLifecycle distills a skill from seeded traces → measures its held-back lift (0 → 1) → the gate promotes it → composeProfile folds it back → the composed profile beats the empty one on the same held-back exam.

A second case proves a worthless distilled skill earns zero lift, fails the gate, and never composes in.

Verification

pnpm run build — clean; new exports present in dist/lifecycle.d.ts + dist/agent.d.ts.
pnpm run typecheck (incl. examples) — clean.
pnpm test — 110 files / 1085 pass, 1 pre-existing skip. Lifecycle suite: 32 pass (21 phase-1 + 2 closed-loop proof + 9 gate/compose).
pnpm run lint — clean.
Merges clean into origin/main.

CandidateGenerator interface the next stages implement

export interface CandidateGenerator<K extends ArtifactKind = ArtifactKind> {
  kind: K
  generate(ctx: GenerateContext): Promise<ArtifactInput<K>[]>
}

export interface GenerateContext {
  baseline: AgentProfile
  domain: string
  findings: ReadonlyArray<AnalystFinding>
  traces?: unknown
  signal?: AbortSignal
}

A generator proposes UNMEASURED candidate artifacts for one surface; runLifecycle owns register → measure → gate → store. skillGenerator is the reference implementation. The next stages add toolGenerator, promptGenerator, mcpGenerator against this same interface.

🤖 Generated with Claude Code

…pose Close the artifact-lifecycle loop on top of the phase-1 foundation (ArtifactRegistry + measureMarginalLift + applyArtifact): - runLifecycle: the one surface-agnostic orchestrator — generate (per-surface CandidateGenerator) → measure each via measureMarginalLift on the held-back split → promote via a pluggable PromotionGate → store with provenance + lift. - CandidateGenerator: the thin per-surface seam (the only per-surface code); generator.ts is the interface the next stages implement. - PromotionGate: thresholdPromotionGate (scalar) + heldOutPromotionGate (delegates to agent-eval HeldOutGate, paired-bootstrap on per-task holdout records; fails loud without them — no fabricated significance). - Registry invariant: promoteWithLift stamps the measured lift; an artifact is active IFF liftOf returns a finite number. composeProfile folds the top-k active artifacts (ranked by lift) back into a profile. - skillGenerator: distill (create a skill from traces — the step skillOpt cannot do) then refine (optimize it) — the answer to "empty profile has no skills". Both steps are injected seams. - lifecycles field on defineAgent: declarative per-surface config the loop reads. - closed-loop.test.ts: the deterministic end-to-end proof — empty profile → distill → measure → promote → compose beats the empty profile on a held-back exam. The loop is closed end-to-end. Closes #267.

tangletools