Skip to content

Testing overhaul: CMake-driven test helpers & equivalence harnesses#253

Merged
caitlinross merged 7 commits into
codes-org:masterfrom
caitlinross:testing-overhaul
Jul 1, 2026
Merged

Testing overhaul: CMake-driven test helpers & equivalence harnesses#253
caitlinross merged 7 commits into
codes-org:masterfrom
caitlinross:testing-overhaul

Conversation

@caitlinross

@caitlinross caitlinross commented Jul 1, 2026

Copy link
Copy Markdown
Member

The CODES test suite was ~30 hand-written, copy-and-edited shell scripts — one per scenario, each re-implementing the same run/grep/diff boilerplate. This replaces that with a small set of CMake test helpers backed by one shared runner, migrates the existing tests onto them, and adds the equivalence harnesses that protect the upcoming config-format and model refactors — most importantly the lp-io config-equivalence check for the upcoming YAML config work to ensure the new format configures simulations correctly.

What changed

Test helpers + runner (tests/CMakeLists.txt, tests/equivalence-run.sh, tests/run-test.sh.in)

  • codes_add_run_test(): single-run smoke/unit tests; adding one is a single CMake line.
  • codes_add_equivalence_test(): run a model N times and diff a marker line (Net Events Processed). Supports SETUP (sourced per-run for config generation), REQUIRE (extra presence checks), REPEAT/VARIANTS.
  • codes_add_lpio_equivalence_test(): run a model with two configs and diff the per-LP lp-io output (identifier-file set + a sorted diff of each). The safety net for swapping a legacy .conf for an equivalent new YAML config and proving the result is byte-identical per LP. Config-format-agnostic so for now tests like this just compare a .conf sim vs a .conf sim.
  • equivalence-run.sh: per-run subdir isolation, optional setup/marker/require/lp-io comparison. MPI launch comes from MPIEXEC_* instead of a hardcoded mpirun.

Migration and fixes

  • ~22 per-scenario .sh scripts replaced by one-line helper calls.
  • Fixed along the way: rc-stack-test was running the wrong binary (modelnet-simplep2p-test) — now runs the rc-stack-test binary; modelnet-prio-sched-test split into -seq/-opt.
  • Dead packet-count check in example-ping-pong-surrogate-{1,2,3}.sh: it had globbed *.txt | wc -l (no command) so it always compared 0 to 0; added the missing cat so it actually compares packet counts.

Note: Now that the Union tests are fixed, they take 30+ mins to complete. I decided to disable them from running in PRs for now, but they will run nightly. The full ci build still runs on PRs and will run all the other tests.

Introduce a declarative way to register equivalence/determinism tests so
that adding one is a single CMake call rather than a copy-and-edit shell
script.

- codes_add_equivalence_test() registers a test that runs a model binary
  two or more times and asserts a marker line ("Net Events Processed" by
  default) is identical across runs. REPEAT covers reproducibility;
  VARIANTS covers comparisons such as seq vs optimistic (--sync=1 vs
  --sync=3).
- equivalence-run.sh is the generic runner behind it: each run executes
  in its own run-N/ subdir (so fixed relative output paths don't
  collide), greps the marker from each run, and diffs them.
- run-test.sh.in now accepts a full command with arguments, not just a
  single legacy per-scenario script.

Migrate example-ping-pong-determinism.sh as the first user of the helper
and drop the standalone script.
Add codes_add_run_test() for single-run smoke/unit tests: run a binary
once under MPI and pass on clean exit. The caller supplies the full
post-binary argument list, which absorbs the various ways CODES binaries
take their config (positional, "-- <conf>", "--codes-config=",
"--conf=", or none).

Convert the per-scenario shell scripts to declarative calls and delete
them: lp-io, jobmap, map-ctx, resource, lsm, rc-stack, the modelnet-*
topology family, the synthetic-traffic binaries, and the workload test.
Two fixes fall out of the migration:

- rc-stack-test ran the wrong binary (modelnet-simplep2p-test) due to a
  copy-paste error; it now runs rc-stack-test.
- modelnet-prio-sched ran both schedulers in one script; split into
  separate seq (--sync=1) and opt (--sync=3) tests.

Tests with custom logic (mapping_test) or held back for review stay as
shell scripts for now.
Generalize equivalence-run.sh into a staged runner that also drives
single-run smoke tests. New options, threaded through both CMake helpers:

- --setup sources a script inside each run dir before the run, so it can
  export env vars and generate the config there; CONFIG/ARGS then
  reference the generated file by bare name.
- --require asserts a line is present in every run's output (e.g.
  "Network switch completed", proving the surrogate actually engaged).
- codes_add_run_test() gains SETUP/MARKER and routes through the runner
  when either is set.

Add the setup scripts (no-logging, surrogate-determinism freeze and
no-freeze) and migrate the no-logging and surrogate-determinism tests to
the helpers, dropping their standalone scripts.

The surrogate-1/2/3 scripts stay as-is: they are cross-config
comparisons (surrogate vs high-fidelity, freeze vs non-freeze) with
custom sed-normalized diffs, not determinism checks. Their final
packet-count check is a no-op (it execs the *.txt files instead of
listing them); fixing that is left to its own commit.
The final check globbed `packet-latency-*/*.txt | wc -l` with no command,
so bash tried to execute the trace files, wc counted 0 on both sides, and
the diff always passed. Add `cat` so it actually compares packet counts.
codes_add_lpio_equivalence_test() runs a model with two configs (each with its
own --lp-io-dir) and diffs the per-LP lp-io output -- the identifier-file set
plus a sorted diff of each file. Two configs are equivalent iff every run
produces identical lp-io output. equivalence-run.sh gains a --lp-io comparison
mode to back it.

This is the safety net for swapping a legacy .conf for an equivalent new config
(e.g. a compiled YAML) and proving the simulation is byte-identical per LP.
Includes a synthetic-dragonfly proof-of-concept (same config twice) that also
serves as an lp-io determinism check.
@codecov

codecov Bot commented Jul 1, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

The union-surrogate suite runs a heavy MILC+Jacobi UNION workload (~30 min for
the set) -- too long to gate every PR on. Label those tests "nightly" (plus a
per-test TIMEOUT) and have the full-lane job skip them on push/PR via
`ctest --label-exclude nightly`, running the complete set including them only
on the scheduled nightly build.
@caitlinross caitlinross merged commit 1aacd2b into codes-org:master Jul 1, 2026
14 checks passed
@caitlinross caitlinross deleted the testing-overhaul branch July 1, 2026 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant