feat(#374): lower memory.copy/memory.fill (bulk-memory) — bounds-checked, trap-correct + v0.11.49#376
Merged
Merged
Conversation
19 falcon sites (11 fill + 8 copy). Same _ => None class as #369/#372; loud-skips on v0.11.47. No WasmOp::MemoryCopy/MemoryFill, no decoder arm, no lowering — a real lowering to build (decode + stack-effect + bounds-checked copy/fill loop with memmove-overlap + OOB trap). Repro only; fix = next block, own release. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ked, trap-correct memory.copy/memory.fill fell through the decoder `_ => None` and (since v0.11.46/GI-FPU-001) loud-skipped the whole function — 19 falcon sites, the largest remaining bulk-mem gap. Unlike #372 the lowering did not exist at all. - WasmOp::MemoryCopy / MemoryFill + decoder arms (memory 0 only; non-zero memory index loud-skips, preserving the GI-FPU-001 honesty contract) - stack effect: pop 3, push 0 (wasm_stack_check) - optimizer decline -> direct selector fallback (#120/#188/#372 pattern) - select_with_stack lowering: fill = STRB byte loop (low byte of val); copy = memmove byte loop with direction by dst/src order (dst>src copies backward). Bounds (Software mode) trap via inline UDF guarded by a LOCAL skip branch, end-EXCLUSIVE (off+len>size or u32-overflow traps; ==size ok), matching wasmtime. The 3 dead popped operands are reused as walking pointers (only R12 extra) — no temp allocation. Gate (value-level, no silicon): bulk_memory_374_differential.py 16/16 vs wasmtime over discriminating vectors (forward, overlap dst>src backward, overlap dst<src forward, self-copy, len==0, dst/src+len==size boundaries NO trap, OOB dst/src TRAP, low-byte fill). Frozen-safe: control_step 0x00210A55 13/13, flight_seam 0x07FDF307, div_const 338/338 byte-identical. Unit tests for decoder/stack-check/selector. rivet GI-MEM-002 (+VER-001). Falcon silicon (falcon-v1.56.fused.wasm) gates the release before #374 closes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…sweep + changelog Pin sweep 0.11.48 -> 0.11.49 (workspace.package + 10 path-dep pins + MODULE.bazel + Cargo.lock synth-* packages). CHANGELOG v0.11.49 with falsification. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
#374 — bulk-memory lowering (
memory.copy/memory.fill)Closes the largest remaining falcon on-target gap: 19 sites (11
memory.fill+ 8memory.copy). Like the scalar floats (#369) and the formeri64.load/store(#372), bulk-memory fell through the decoder_ => Noneand (since v0.11.46 / GI-FPU-001) loud-skipped the whole function. Unlike #372 the lowering did not exist at all.What landed
WasmOp::MemoryCopy/MemoryFill+ decoder arms (memory 0 only — a non-zero memory index still loud-skips, preserving the GI-FPU-001 honesty contract)pop 3, push 0#120/#188/#372pattern)select_with_stack:fill→STRBloop writing the low byte ofvalcopy→ memmove byte loop;dst > srccopies backward so overlapping copies don't corrupt the source--safety-bounds software: inlineUDFguarded by a local skip branch, end-exclusive trap (off+len > sizeoru32overflow;== sizeok), matching wasmtime. (Self-contained image doesn't relocate a bodyTrap_Handlerbranch, soUDF; on silicon UsageFault/HardFault routes toTrap_Handlervia the vector table.)R12extra, no temp allocationVerification
scripts/repro/bulk_memory_374_differential.py: 16/16 vs wasmtime over discriminating vectors (forward, overlapdst>srcbackward, overlapdst<srcforward, self-copy,len==0,dst+len==size&src+len==sizeboundaries NO-trap, OOBdst/srcTRAP, low-byte fill) — compares full 64 KiB image and trap outcome.control_step0x00210A55(13/13),flight_seam0x07FDF307,div_const338/338 byte-identical.*_374).cargo test --workspace --exclude synth-verifygreen; fmt + clippy clean; rivetGI-MEM-002(+VER-001), non-xref errors 0.Release gate
v0.11.49 is prepped (pin sweep + CHANGELOG). Holding the tag for gale's
falcon-v1.56.fused.wasmG474RE round-trip before closing #374 — the one path unicorn can't see isUDF → UsageFault/HardFault → vector-table → Trap_Handler(the in-bounds copy/fill loop is the high-fidelity, solidly-verified part).🤖 Generated with Claude Code