refactor(statistics): consolidate divergent pearsonR copies into one shared helper by drewstone · Pull Request #273 · tangle-network/agent-eval

drewstone · 2026-06-22T13:39:55Z

Closes the pearsonR-consolidation item from #77 (the named post-#263 follow-up).

Problem

Seven source files each carried a private Pearson-correlation implementation, with divergent edge-case handling:

file	`n < 2`	both constant	one constant	length guard
`rl/reward-hacking.ts`	`0`	`0`	`0`	yes
`pipelines/judge-agreement.ts`	`NaN`	`1`	`0`	yes
`judge-calibration.ts`	`NaN`	`1`	`0`	yes
`builder-eval/correlation.ts`	(none → NaN)	`1`	`0`	no
`contract/analyze-runs.ts`	`0` (allows n=1)	`0`	`0`	n===0 only
`meta-eval/rubric-predictive-validity.ts`	`NaN`	`1`	`0`	yes
`meta-eval/correlation-study.ts`	`NaN`	`1`	`0`	yes

The rank-with-ties and Spearman helpers were duplicated alongside them (4 ranks/rankWithTies copies).

Change

One pearsonR / spearmanR / ranks added to src/statistics.ts — already the home of the project's stat primitives (pairedBootstrap, cohensD, mcnemar, …) — and every call site routed through it. Newly exported from the root entry.

Edge-case contract (explicit, documented in-file):

length mismatch or n < 2 → NaN (correlation is undefined; distinct from a measured 0)
both series constant → 1 (degenerate perfect agreement)
exactly one series constant → 0 (no covariation to detect)

This matches the dominant (5/7) convention and is the statistically defensible one.

analyze-runs inter-rater kappa now averages only the finite pair correlations, so a degenerate pair (a single jointly-rated run, or a constant rater) yields NaN and is dropped rather than dragging the mean toward an artificial 0 — the same pattern judge-calibration's avgPairwise and correlation-study's bootstrap already use.

All 7 local copies (pearson/pearsonR/spearman/spearmanR/ranks/rank/rankWithTies) deleted. Net -52 LOC.

Verification

pnpm run typecheck — clean
pnpm run lint — exit 0 (pre-existing warnings only, none in touched files)
pnpm run build — clean (tsup + openapi)
pnpm test — 246 files, 2515 pass, 2 skipped
New describe blocks for pearsonR/ranks/spearmanR cover the full edge-case contract (n<2 → NaN, both/one constant, length mismatch, ±1 linear, non-linear monotone Spearman, average ranks for ties).

…e shared helper Seven files each carried a private Pearson-correlation implementation with divergent edge-case handling: n<2 returned 0 in some copies and NaN in others; zero-variance returned 0 in two copies and 1 in five; one copy had no length guard at all and one allowed n=1. The same divergence existed for the rank-with-ties and Spearman helpers. Add a single edge-cased pearsonR/spearmanR/ranks to the canonical statistics module (already the home of the project's stat primitives) and route every call site through it. Contract: length-mismatch or n<2 -> NaN (undefined, not a measured 0); both series constant -> 1; exactly one constant -> 0. analyze-runs' inter-rater kappa now averages only the finite pair correlations, so a degenerate pair (single jointly-rated run, or a constant rater) no longer drags the mean toward an artificial 0. Net -52 LOC; pearsonR/spearmanR/ranks newly exported from the root entry.

tangletools

✅ Auto-approved PR — `4b72e2ae`

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-22T13:40:02Z}

Version trio bumped together (npm package.json + clients/python/pyproject.toml + agent_eval_rpc.__version__) per the npm<->PyPI lock.

tangletools

✅ Auto-approved PR — `e7ca9d1c`

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-22T13:42:38Z}

tangletools previously approved these changes Jun 22, 2026

View reviewed changes

chore(release): 0.96.1 — consolidated pearsonR helper

e7ca9d1

Version trio bumped together (npm package.json + clients/python/pyproject.toml + agent_eval_rpc.__version__) per the npm<->PyPI lock.

drewstone dismissed tangletools’s stale review via e7ca9d1 June 22, 2026 13:42

tangletools approved these changes Jun 22, 2026

View reviewed changes

drewstone merged commit e94d0ae into main Jun 22, 2026
1 check passed

drewstone mentioned this pull request Jun 22, 2026

chore(api): prune unused public surface #77

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(statistics): consolidate divergent pearsonR copies into one shared helper#273

refactor(statistics): consolidate divergent pearsonR copies into one shared helper#273
drewstone merged 2 commits into
mainfrom
chore/consolidate-pearson-77

drewstone commented Jun 22, 2026

Uh oh!

tangletools left a comment

Uh oh!

tangletools left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

drewstone commented Jun 22, 2026

Problem

Change

Verification

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved PR — 4b72e2ae

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved PR — e7ca9d1c

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

✅ Auto-approved PR — `4b72e2ae`

✅ Auto-approved PR — `e7ca9d1c`