feat(contract): evalReportingSuite — one call from runs (or a run dir) to analysis.json by drewstone · Pull Request #272 · tangle-network/agent-eval

drewstone · 2026-06-22T13:34:54Z

What

Adds evalReportingSuite(input, opts) to the /contract public surface: the one-call path from a set of runs — RunRecord[] in memory, or a .json / .jsonl file, or a directory of them — to a single analysis.json.

It is a thin wrapper, not a new analysis engine. All distributions, paired stats/lift, and the findings rollup come from the existing analyzeRuns primitive verbatim; the suite only resolves the input into validated records, calls analyzeRuns with the options you'd pass it directly, wraps the result in a small provenance envelope, and optionally writes the artifact.

// From a directory of run files, write ./runs/analysis.json:
const suite = await evalReportingSuite('./runs', { write: true })
// From records already in memory:
const suite = await evalReportingSuite(records, { analyze: { decisionThreshold: 0.03 } })
suite.report // the InsightReport — distributions, paired lift, findings rollup

New surface (all additive, `/contract`)

evalReportingSuite + EvalReportingSuiteInput / EvalReportingSuiteOptions / EvalReportingSuiteResult.
fromRunRecordDir (new intake adapter, alongside the existing from* adapters) + FromRunRecordDirOptions / FromRunRecordDirResult / RunRecordRejection. Loads a .json (array) / .jsonl (one record per line) file or a directory of them, validating each record at the boundary via parseRunRecordSafe. Fails loud on an invalid record by default; onInvalid: 'collect' keeps the valid ones and returns the rejects.

Design notes

Reuses analyzeRuns (composite/cost distributions, per-dimension stats, paired-bootstrap lift, failure-mode + cluster rollup, recommendations) — the test asserts the wrapped report is byte-identical to calling analyzeRuns directly, so the wrapper can't drift into reimplementing analysis.
The JSON/JSONL parsing mirrors the proven loadRunRecords path already used by the analyze_runs eval tool, promoted into the public intake family where it belongs.
A re-run over a directory ignores its own analysis.json output (never ingests its own artifact).
write:true on in-memory records fails loud — there's no directory to anchor the artifact to; pass an explicit path instead.

Verification

pnpm run lint — 0 errors (pre-existing warnings/infos in unrelated files untouched)
pnpm run typecheck — clean
pnpm run build — clean (incl. OpenAPI spec emit)
pnpm test — 247 files / 2512 tests pass (2 pre-existing skips), incl. 11 new suite tests

Version trio bumped together: 0.95.1 → 0.96.0 (package.json + clients/python/pyproject.toml + __init__.py).

…) to analysis.json Thin wrapper over analyzeRuns + a new fromRunRecordDir intake adapter. Resolves a RunRecord[] or a .json/.jsonl file or directory into validated records, runs analyzeRuns (distributions, paired lift, findings rollup), and optionally writes a single analysis.json. No analysis logic of its own — pure composition + I/O over the existing reporting primitives.

tangletools

✅ Auto-approved PR — `18b015b2`

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-22T13:35:02Z}

tangletools approved these changes Jun 22, 2026

View reviewed changes

drewstone merged commit 646ad9e into main Jun 22, 2026
1 check passed

This was referenced Jun 22, 2026

chore(api): prune unused public surface #77

Closed

feat(reporting): add evalReportingSuite wrapper #76

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(contract): evalReportingSuite — one call from runs (or a run dir) to analysis.json#272

feat(contract): evalReportingSuite — one call from runs (or a run dir) to analysis.json#272
drewstone merged 1 commit into
mainfrom
feat/eval-reporting-suite

drewstone commented Jun 22, 2026

Uh oh!

tangletools left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

drewstone commented Jun 22, 2026

What

New surface (all additive, /contract)

Design notes

Verification

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved PR — 18b015b2

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

New surface (all additive, `/contract`)

✅ Auto-approved PR — `18b015b2`