Testing overhaul: CMake-driven test helpers & equivalence harnesses#253
Merged
Conversation
Introduce a declarative way to register equivalence/determinism tests so
that adding one is a single CMake call rather than a copy-and-edit shell
script.
- codes_add_equivalence_test() registers a test that runs a model binary
two or more times and asserts a marker line ("Net Events Processed" by
default) is identical across runs. REPEAT covers reproducibility;
VARIANTS covers comparisons such as seq vs optimistic (--sync=1 vs
--sync=3).
- equivalence-run.sh is the generic runner behind it: each run executes
in its own run-N/ subdir (so fixed relative output paths don't
collide), greps the marker from each run, and diffs them.
- run-test.sh.in now accepts a full command with arguments, not just a
single legacy per-scenario script.
Migrate example-ping-pong-determinism.sh as the first user of the helper
and drop the standalone script.
Add codes_add_run_test() for single-run smoke/unit tests: run a binary once under MPI and pass on clean exit. The caller supplies the full post-binary argument list, which absorbs the various ways CODES binaries take their config (positional, "-- <conf>", "--codes-config=", "--conf=", or none). Convert the per-scenario shell scripts to declarative calls and delete them: lp-io, jobmap, map-ctx, resource, lsm, rc-stack, the modelnet-* topology family, the synthetic-traffic binaries, and the workload test. Two fixes fall out of the migration: - rc-stack-test ran the wrong binary (modelnet-simplep2p-test) due to a copy-paste error; it now runs rc-stack-test. - modelnet-prio-sched ran both schedulers in one script; split into separate seq (--sync=1) and opt (--sync=3) tests. Tests with custom logic (mapping_test) or held back for review stay as shell scripts for now.
Generalize equivalence-run.sh into a staged runner that also drives single-run smoke tests. New options, threaded through both CMake helpers: - --setup sources a script inside each run dir before the run, so it can export env vars and generate the config there; CONFIG/ARGS then reference the generated file by bare name. - --require asserts a line is present in every run's output (e.g. "Network switch completed", proving the surrogate actually engaged). - codes_add_run_test() gains SETUP/MARKER and routes through the runner when either is set. Add the setup scripts (no-logging, surrogate-determinism freeze and no-freeze) and migrate the no-logging and surrogate-determinism tests to the helpers, dropping their standalone scripts. The surrogate-1/2/3 scripts stay as-is: they are cross-config comparisons (surrogate vs high-fidelity, freeze vs non-freeze) with custom sed-normalized diffs, not determinism checks. Their final packet-count check is a no-op (it execs the *.txt files instead of listing them); fixing that is left to its own commit.
The final check globbed `packet-latency-*/*.txt | wc -l` with no command, so bash tried to execute the trace files, wc counted 0 on both sides, and the diff always passed. Add `cat` so it actually compares packet counts.
codes_add_lpio_equivalence_test() runs a model with two configs (each with its own --lp-io-dir) and diffs the per-LP lp-io output -- the identifier-file set plus a sorted diff of each file. Two configs are equivalent iff every run produces identical lp-io output. equivalence-run.sh gains a --lp-io comparison mode to back it. This is the safety net for swapping a legacy .conf for an equivalent new config (e.g. a compiled YAML) and proving the simulation is byte-identical per LP. Includes a synthetic-dragonfly proof-of-concept (same config twice) that also serves as an lp-io determinism check.
d5c0163 to
8a6c13f
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
The union-surrogate suite runs a heavy MILC+Jacobi UNION workload (~30 min for the set) -- too long to gate every PR on. Label those tests "nightly" (plus a per-test TIMEOUT) and have the full-lane job skip them on push/PR via `ctest --label-exclude nightly`, running the complete set including them only on the scheduled nightly build.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The CODES test suite was ~30 hand-written, copy-and-edited shell scripts — one per scenario, each re-implementing the same run/grep/diff boilerplate. This replaces that with a small set of CMake test helpers backed by one shared runner, migrates the existing tests onto them, and adds the equivalence harnesses that protect the upcoming config-format and model refactors — most importantly the lp-io config-equivalence check for the upcoming YAML config work to ensure the new format configures simulations correctly.
What changed
Test helpers + runner (
tests/CMakeLists.txt,tests/equivalence-run.sh,tests/run-test.sh.in)codes_add_run_test(): single-run smoke/unit tests; adding one is a single CMake line.codes_add_equivalence_test(): run a model N times and diff a marker line (Net Events Processed). SupportsSETUP(sourced per-run for config generation),REQUIRE(extra presence checks),REPEAT/VARIANTS.codes_add_lpio_equivalence_test(): run a model with two configs and diff the per-LP lp-io output (identifier-file set + a sorted diff of each). The safety net for swapping a legacy.conffor an equivalent new YAML config and proving the result is byte-identical per LP. Config-format-agnostic so for now tests like this just compare a.confsim vs a.confsim.equivalence-run.sh: per-run subdir isolation, optional setup/marker/require/lp-io comparison. MPI launch comes fromMPIEXEC_*instead of a hardcodedmpirun.Migration and fixes
rc-stack-testwas running the wrong binary (modelnet-simplep2p-test) — now runs therc-stack-testbinary;modelnet-prio-sched-testsplit into -seq/-opt.example-ping-pong-surrogate-{1,2,3}.sh: it had globbed*.txt | wc -l(no command) so it always compared0to0; added the missingcatso it actually compares packet counts.Note: Now that the Union tests are fixed, they take 30+ mins to complete. I decided to disable them from running in PRs for now, but they will run nightly. The full ci build still runs on PRs and will run all the other tests.