fix(#372): i64.load/i64.store correctness + v0.11.47#373
Merged
Conversation
…+ decoder WIP
NOT a fix (partial WIP). Diagnosed why falcon loud-skips 39 i64.load/store sites
on v0.11.46. It's THREE layers, worse than the issue assumed (gale thought the
lowering was ready):
L1 DECODER GAP: convert_operator decodes the narrow i64 loads (I64Load8..32)
and I32Load/Store but has NO arm for full-width I64Load/I64Store -> _ => None
-> dropped -> loud-skipped since v0.11.46 (GI-FPU-001). (decoder arm added
here as WIP — INSUFFICIENT alone: see L2.)
L2 OPTIMIZED-PATH STUB: with the op decoded, the default optimized path drops
it -> stub (ld64 -> `bx lr`, st64 -> `mov r0,r1`). So the decoder arm ALONE
is net-negative (turns the honest loud-skip into a silent stub again). Needs
an optimizer decline -> direct-selector fallback (mirror #120/#188).
L3 ENCODER DROPS THE ADDRESS: arm_encoder.rs:5303 I64Ldr/I64Str use addr.base
+ addr.offset but IGNORE addr.index (the address register). Emits
[R11+offset]/[R11+offset+4], dropping the operand -> reads the WRONG location.
Proven numerically: ld64(16) returns mem[0] (0xaa..), not mem[16]
(i64_load_store_372_differential.py). Same class as #206 indexed-load drop.
Fix (next block): decoder arm + optimizer decline + I64Ldr/I64Str encoder index
(materialize ip = addr_reg + offset; ldr/str [R11, ip] {,+4}, like #206), gated
on the numeric differential. Frozen fixtures use no i64.load/store.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…coder index)
falcon loud-skipped 39 i64.load/store sites on v0.11.46. The gap was 3 layers,
and the lowering itself was broken (not just unwired):
L1 DECODER: convert_operator decoded the narrow i64 loads (I64Load8..32) and
I32Load/Store but had NO arm for full-width I64Load/I64Store -> _ => None ->
dropped -> loud-skipped. Added the two arms.
L2 OPTIMIZER: the default optimized path has no IR opcode for them and dropped
them to a stub (ld64 -> `bx lr`). optimize_full now DECLINES i64.load/store
-> falls back to the direct selector (the #120/#188 pattern).
L3 ENCODER (the real bug): arm_encoder.rs I64Ldr/I64Str used addr.base+offset
and IGNORED addr.offset_reg -> emitted [R11+offset], dropping the address.
Proven: ld64(16) returned mem[0], not mem[16]. New i64_effective_base()
materializes `ADD.W ip, base, index` (byte-verified) then loads/stores via
[ip,#off]/[ip,#off+4]; non-indexed frame access (offset_reg=None) is
unchanged -> byte-identical. Same class as #206.
Verified: ld64(16)=mem[16] and st64 use the address on BOTH the optimized and
direct paths (i64_load_store_372_differential.py); frozen oracles byte-identical
(control_step 0x00210A55 13/13, flight_seam 0x07FDF307, div_const 338/338);
i64-FRAME fixtures byte-identical (u64_unpack); unit test
test_372_i64_ldr_indexed_materializes_address; 39 suites green; fmt+clippy clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pin sweep 0.11.46 -> 0.11.47 (workspace + 10 path-deps + MODULE.bazel + Cargo.lock). CHANGELOG v0.11.47 with falsification. rivet GI-MEM-001 -> implemented + GI-MEM-VER-001. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #372 — full-width
i64.load/i64.store. jess: 39 falcon sites. Ships as v0.11.47.The bug was 3 layers (deeper than "wire the decoder")
I64Load/I64Storearm (only narrowI64Load8..32) → dropped → loud-skipped since v0.11.46. Added the arms.ld64→bx lr).optimize_fullnow declines them → direct-selector fallback (wasm_to_ir: unmapped vreg panic still trips on compiler_builtins (float::div) and gale_compute_ipi_mask after v0.2.1's #97 memset fix #120/fuzz: i64_lowering_doesnt_clobber_params flags Mov R0,R8 as AAPCS clobber — real clobber or harness false-positive? #188 pattern).I64Ldr/I64Strignoredaddr.offset_reg, emitting[R11+offset]→ dropped the address (ld64(16)readmem[0], notmem[16]; arm32 encoder drops the register index on indexed loads/stores ([r11] instead of [r11, rN]) #206 class). Newi64_effective_basematerializesADD.W ip, base, indexthen[ip,#off]/[ip,#off+4]. Non-indexed frame access unchanged → byte-identical.Verification
ld64(16)=mem[16],st64writesmem[addr]on both optimized and direct paths (i64_load_store_372_differential.py).0x00210A5513/13, flight_seam0x07FDF307, div_const 338/338); i64-frame fixtures byte-identical (u64_unpack).test_372_i64_ldr_indexed_materializes_address; 39 suites green; fmt + clippy clean.Falsification:
i64.load/i64.storeread/write the addressed location instead ofmem[0]; pure-integer + i64-frame modules byte-identical.Closes #372.
🤖 Generated with Claude Code