Add UB and address sanitizers to CI#252
Merged
Merged
Conversation
packet_arrive_rc applied qhash_entry() (a container_of) directly to the result of qhash_search(), which returns NULL on a miss. container_of on NULL forms an out-of-bounds pointer (NULL - offsetof) — undefined behavior even before any dereference. UBSan flags it during optimistic rollback: dragonfly-dally.cxx:6123: runtime error: applying non-zero offset 18446744073709551576 to null pointer (in packet_arrive_rc) Guard the qhash_entry on a non-NULL hash_link, matching the forward handler (packet_arrive) and the sibling reverse handler, which already do. tmp stays NULL on a miss, which the downstream branches already handle.
Add a `sanitizers` job (ubuntu-24.04 / gcc / MPICH) that runs CODES under ASan and UBSan as separate matrix legs, plus `asan`/`ubsan` CMake presets and a CODES_SANITIZER knob that instruments every CODES target (the library, executables, and tests). ROSS is a prebuilt imported target and is not reinstrumented, so this is CODES-only for now. Separate legs so each reports and gates independently. UBSan gates now: -fno-sanitize-recover (baked into the preset) makes any UB abort, so a real finding fails the job. ASan is informational to start (continue-on-error) with LeakSanitizer disabled (detect_leaks=0) so the lane delivers the high-value detections (heap-buffer-overflow, use-after-free, stack overflow) without MPICH and teardown leak noise; enabling leaks with a suppression file and flipping ASan to gating is a follow-up. log_path captures rank-level sanitizer reports past the test harnesses' per-rank stderr redirection (e.g. the ping-pong determinism test's `2> model-output-error.txt`) and uploads them as artifacts on failure.
dragonfly_sample_fin called fwrite(s->sample_stat, ..., s->op_arr_size) unconditionally at finalize. A terminal that recorded no compute-node samples has op_arr_size == 0 and a NULL sample_stat (never allocated). fwrite declares its buffer nonnull, so passing NULL is undefined behavior even for a zero-element write; gcc's UBSan flags it: dragonfly.c:2379: runtime error: null pointer passed as argument 1, which is declared to never be null (in dragonfly_sample_fin) Guard the write on op_arr_size > 0. The router fin (dragonfly_rsample_fin) and the mid-run flush already avoid this — the former writes inside a for (i < op_arr_size) loop, the latter runs after sample_stat is malloc'd.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
ASan at Debug's -O0 made the suite ~59 min on CI — the six ping-pong tests ran 10-15 min each. ASan docs recommend -O1, which brings the full suite to ~5 min, fast enough to run on every PR. Scoped to the asan preset; UBSan stays -O0. With the lane green and fast, drop continue-on-error so an ASan finding blocks merge like UBSan. LeakSanitizer stays off (detect_leaks=0) pending a suppression-file follow-up.
6d3985c to
5728834
Compare
doc/dev/ci.md predated the asan/ubsan lanes. Document how they work (separate gating legs, CODES-only instrumentation, ASan at -O1, leaks off, log_path capture, local usage) and record the deferred follow-ups (enable LeakSanitizer + suppressions) so they're tracked in-repo rather than forgotten.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adding UB and address sanitizers to CI. UB sanitizer already uncovered some issues that are also fixed here. I didn't turn on the memory leak check because we'll likely need a supression file.