Testing overhaul: CMake-driven test helpers & equivalence harnesses by caitlinross · Pull Request #253 · codes-org/codes

caitlinross · 2026-07-01T17:34:23Z

The CODES test suite was ~30 hand-written, copy-and-edited shell scripts — one per scenario, each re-implementing the same run/grep/diff boilerplate. This replaces that with a small set of CMake test helpers backed by one shared runner, migrates the existing tests onto them, and adds the equivalence harnesses that protect the upcoming config-format and model refactors — most importantly the lp-io config-equivalence check for the upcoming YAML config work to ensure the new format configures simulations correctly.

What changed

Test helpers + runner (tests/CMakeLists.txt, tests/equivalence-run.sh, tests/run-test.sh.in)

codes_add_run_test(): single-run smoke/unit tests; adding one is a single CMake line.
codes_add_equivalence_test(): run a model N times and diff a marker line (Net Events Processed). Supports SETUP (sourced per-run for config generation), REQUIRE (extra presence checks), REPEAT/VARIANTS.
codes_add_lpio_equivalence_test(): run a model with two configs and diff the per-LP lp-io output (identifier-file set + a sorted diff of each). The safety net for swapping a legacy .conf for an equivalent new YAML config and proving the result is byte-identical per LP. Config-format-agnostic so for now tests like this just compare a .conf sim vs a .conf sim.
equivalence-run.sh: per-run subdir isolation, optional setup/marker/require/lp-io comparison. MPI launch comes from MPIEXEC_* instead of a hardcoded mpirun.

Migration and fixes

~22 per-scenario .sh scripts replaced by one-line helper calls.
Fixed along the way: rc-stack-test was running the wrong binary (modelnet-simplep2p-test) — now runs the rc-stack-test binary; modelnet-prio-sched-test split into -seq/-opt.
Dead packet-count check in example-ping-pong-surrogate-{1,2,3}.sh: it had globbed *.txt | wc -l (no command) so it always compared 0 to 0; added the missing cat so it actually compares packet counts.

Note: Now that the Union tests are fixed, they take 30+ mins to complete. I decided to disable them from running in PRs for now, but they will run nightly. The full ci build still runs on PRs and will run all the other tests.

Introduce a declarative way to register equivalence/determinism tests so that adding one is a single CMake call rather than a copy-and-edit shell script. - codes_add_equivalence_test() registers a test that runs a model binary two or more times and asserts a marker line ("Net Events Processed" by default) is identical across runs. REPEAT covers reproducibility; VARIANTS covers comparisons such as seq vs optimistic (--sync=1 vs --sync=3). - equivalence-run.sh is the generic runner behind it: each run executes in its own run-N/ subdir (so fixed relative output paths don't collide), greps the marker from each run, and diffs them. - run-test.sh.in now accepts a full command with arguments, not just a single legacy per-scenario script. Migrate example-ping-pong-determinism.sh as the first user of the helper and drop the standalone script.

Add codes_add_run_test() for single-run smoke/unit tests: run a binary once under MPI and pass on clean exit. The caller supplies the full post-binary argument list, which absorbs the various ways CODES binaries take their config (positional, "-- <conf>", "--codes-config=", "--conf=", or none). Convert the per-scenario shell scripts to declarative calls and delete them: lp-io, jobmap, map-ctx, resource, lsm, rc-stack, the modelnet-* topology family, the synthetic-traffic binaries, and the workload test. Two fixes fall out of the migration: - rc-stack-test ran the wrong binary (modelnet-simplep2p-test) due to a copy-paste error; it now runs rc-stack-test. - modelnet-prio-sched ran both schedulers in one script; split into separate seq (--sync=1) and opt (--sync=3) tests. Tests with custom logic (mapping_test) or held back for review stay as shell scripts for now.

Generalize equivalence-run.sh into a staged runner that also drives single-run smoke tests. New options, threaded through both CMake helpers: - --setup sources a script inside each run dir before the run, so it can export env vars and generate the config there; CONFIG/ARGS then reference the generated file by bare name. - --require asserts a line is present in every run's output (e.g. "Network switch completed", proving the surrogate actually engaged). - codes_add_run_test() gains SETUP/MARKER and routes through the runner when either is set. Add the setup scripts (no-logging, surrogate-determinism freeze and no-freeze) and migrate the no-logging and surrogate-determinism tests to the helpers, dropping their standalone scripts. The surrogate-1/2/3 scripts stay as-is: they are cross-config comparisons (surrogate vs high-fidelity, freeze vs non-freeze) with custom sed-normalized diffs, not determinism checks. Their final packet-count check is a no-op (it execs the *.txt files instead of listing them); fixing that is left to its own commit.

The final check globbed `packet-latency-*/*.txt | wc -l` with no command, so bash tried to execute the trace files, wc counted 0 on both sides, and the diff always passed. Add `cat` so it actually compares packet counts.

codes_add_lpio_equivalence_test() runs a model with two configs (each with its own --lp-io-dir) and diffs the per-LP lp-io output -- the identifier-file set plus a sorted diff of each file. Two configs are equivalent iff every run produces identical lp-io output. equivalence-run.sh gains a --lp-io comparison mode to back it. This is the safety net for swapping a legacy .conf for an equivalent new config (e.g. a compiled YAML) and proving the simulation is byte-identical per LP. Includes a synthetic-dragonfly proof-of-concept (same config twice) that also serves as an lp-io determinism check.

codecov · 2026-07-01T17:46:46Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

The union-surrogate suite runs a heavy MILC+Jacobi UNION workload (~30 min for the set) -- too long to gate every PR on. Label those tests "nightly" (plus a per-test TIMEOUT) and have the full-lane job skip them on push/PR via `ctest --label-exclude nightly`, running the complete set including them only on the scheduled nightly build.

caitlinross added 6 commits July 1, 2026 12:34

ci: correct misleading comment

d648242

tests: fix dead packet-count check in ping-pong surrogate tests

14baa0a

The final check globbed `packet-latency-*/*.txt | wc -l` with no command, so bash tried to execute the trace files, wc counted 0 on both sides, and the diff always passed. Add `cat` so it actually compares packet counts.

caitlinross force-pushed the testing-overhaul branch from d5c0163 to 8a6c13f Compare July 1, 2026 17:42

caitlinross merged commit 1aacd2b into codes-org:master Jul 1, 2026
14 checks passed

caitlinross deleted the testing-overhaul branch July 1, 2026 18:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Testing overhaul: CMake-driven test helpers & equivalence harnesses#253

Testing overhaul: CMake-driven test helpers & equivalence harnesses#253
caitlinross merged 7 commits into
codes-org:masterfrom
caitlinross:testing-overhaul

caitlinross commented Jul 1, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

caitlinross commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed

Uh oh!

codecov Bot commented Jul 1, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

caitlinross commented Jul 1, 2026 •

edited

Loading