Skip to content

refactor(statistics): consolidate divergent pearsonR copies into one shared helper#273

Merged
drewstone merged 2 commits into
mainfrom
chore/consolidate-pearson-77
Jun 22, 2026
Merged

refactor(statistics): consolidate divergent pearsonR copies into one shared helper#273
drewstone merged 2 commits into
mainfrom
chore/consolidate-pearson-77

Conversation

@drewstone

Copy link
Copy Markdown
Contributor

Closes the pearsonR-consolidation item from #77 (the named post-#263 follow-up).

Problem

Seven source files each carried a private Pearson-correlation implementation, with divergent edge-case handling:

file n < 2 both constant one constant length guard
rl/reward-hacking.ts 0 0 0 yes
pipelines/judge-agreement.ts NaN 1 0 yes
judge-calibration.ts NaN 1 0 yes
builder-eval/correlation.ts (none → NaN) 1 0 no
contract/analyze-runs.ts 0 (allows n=1) 0 0 n===0 only
meta-eval/rubric-predictive-validity.ts NaN 1 0 yes
meta-eval/correlation-study.ts NaN 1 0 yes

The rank-with-ties and Spearman helpers were duplicated alongside them (4 ranks/rankWithTies copies).

Change

One pearsonR / spearmanR / ranks added to src/statistics.ts — already the home of the project's stat primitives (pairedBootstrap, cohensD, mcnemar, …) — and every call site routed through it. Newly exported from the root entry.

Edge-case contract (explicit, documented in-file):

  • length mismatch or n < 2NaN (correlation is undefined; distinct from a measured 0)
  • both series constant → 1 (degenerate perfect agreement)
  • exactly one series constant → 0 (no covariation to detect)

This matches the dominant (5/7) convention and is the statistically defensible one.

analyze-runs inter-rater kappa now averages only the finite pair correlations, so a degenerate pair (a single jointly-rated run, or a constant rater) yields NaN and is dropped rather than dragging the mean toward an artificial 0 — the same pattern judge-calibration's avgPairwise and correlation-study's bootstrap already use.

All 7 local copies (pearson/pearsonR/spearman/spearmanR/ranks/rank/rankWithTies) deleted. Net -52 LOC.

Verification

  • pnpm run typecheck — clean
  • pnpm run lint — exit 0 (pre-existing warnings only, none in touched files)
  • pnpm run build — clean (tsup + openapi)
  • pnpm test — 246 files, 2515 pass, 2 skipped
  • New describe blocks for pearsonR/ranks/spearmanR cover the full edge-case contract (n<2 → NaN, both/one constant, length mismatch, ±1 linear, non-linear monotone Spearman, average ranks for ties).

…e shared helper

Seven files each carried a private Pearson-correlation implementation with
divergent edge-case handling: n<2 returned 0 in some copies and NaN in
others; zero-variance returned 0 in two copies and 1 in five; one copy had
no length guard at all and one allowed n=1. The same divergence existed for
the rank-with-ties and Spearman helpers.

Add a single edge-cased pearsonR/spearmanR/ranks to the canonical statistics
module (already the home of the project's stat primitives) and route every
call site through it. Contract: length-mismatch or n<2 -> NaN (undefined, not
a measured 0); both series constant -> 1; exactly one constant -> 0.

analyze-runs' inter-rater kappa now averages only the finite pair
correlations, so a degenerate pair (single jointly-rated run, or a constant
rater) no longer drags the mean toward an artificial 0.

Net -52 LOC; pearsonR/spearmanR/ranks newly exported from the root entry.
tangletools
tangletools previously approved these changes Jun 22, 2026

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved PR — 4b72e2ae

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-22T13:40:02Z

Version trio bumped together (npm package.json + clients/python/pyproject.toml
+ agent_eval_rpc.__version__) per the npm<->PyPI lock.

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved PR — e7ca9d1c

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-22T13:42:38Z

@drewstone drewstone merged commit e94d0ae into main Jun 22, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants