fix(simulator): unbreak CI and make blocks actually fill (Add/Sub union, 2.0x pre-init buffer, observed-gas recalibration)#194
Draft
meyer9 wants to merge 3 commits into
Draft
fix(simulator): unbreak CI and make blocks actually fill (Add/Sub union, 2.0x pre-init buffer, observed-gas recalibration)#194meyer9 wants to merge 3 commits into
meyer9 wants to merge 3 commits into
Conversation
Before this change, both OpcodeStats.Add and OpcodeStats.Sub iterated only
the 'other' argument:
for opcode, count := range other {
result[opcode] = o[opcode] + count
}
This silently dropped any key present in the receiver 'o' but absent from
'other'. For Add(empty) the result was an empty map, losing 'o' entirely.
Concrete consequence in sendTxs: per-tx blockCounts is computed as
blockCounts = expected.Sub(actual).Round()
where 'expected' is base x Mul(numCalls+1 x scaleFactor) (carrying all base
opcodes + precompiles) and 'actual' starts empty for the first tx. With the
buggy Sub iterating actual=empty, blockCounts.Opcodes and
blockCounts.Precompiles were ALWAYS empty for every tx in every block.
The downstream effect: workloads with precompile counts in their config
(e.g. base-mainnet-simulation: bls12381MapG2, ecrecover, bls12381G1Add,
bls12381G1MultiExp) had per-tx execution that skipped all precompile work.
Each tx therefore consumed far less gas than the gas estimate predicted,
and blocks under-filled the gas budget. Example: base-mainnet-simulation @
GasLimit=250M (block_time=1s) reported gas/per_block ~51M (20% of limit)
in production runs because per-tx skipped the heaviest workload component.
Fix: both Add and Sub now iterate the union of keys. For Sub, a key present
only in 'other' produces a negative result entry (matches arithmetic
semantics).
Includes unit tests for the union semantics and a regression test for the
first-tx scenario in sendTxs.
The basic-benchmarks job in public-benchmarks.yaml has been failing on every push to main since cf47296 (PR #184, May 15) with: execution reverted: Not enough accounts to load/update Reproduces consistently for base-mainnet-simulation @ 25M gas (the first cell of the matrix, run on both reth and geth). Fails roughly 7-8 benchmark blocks into a 20-block run. Root mechanism: PR #184 changed targetCalls from ceil(numCallsPerBlock * scaleFactor) // pre-fix to numCallsPerBlock // post-fix to address an overshoot for scaleFactor > 1 (e.g. 200M gas case). For scaleFactor < 1 (small gas limits like 25M), this raised on-chain consumption within the existing 5% pre-init buffer. CI's base-mainnet-simulation @ 25M sits right at the boundary where rounding accumulation + per-field interaction overruns the 5% margin and trips the contract's current_address_index + load + update <= num_address_initialized require well before block 20. Bumping the multiplier to 2.0x removes the boundary entirely. Pre-init runs during setup (one-time cost, dominated by mineAndConfirm batching), so this adds maybe a few seconds of setup wall time per test cell — negligible vs the ~5 minutes per cell + 1h total job time. Out of scope for this PR but noted for follow-up: - OpcodeStats.Sub/Add iterate 'other' rather than the union of s and other, which causes per-tx blockCounts.Precompiles to always be empty in sendTxs. Result: per-tx actual gas is much lower than gas_per_call estimate, so the benchmark under-utilizes the gas budget. Not the cause of THIS CI failure, but should be fixed. - The contract require is a benchmark-fidelity check, not a protocol constraint; making the worker recalibrate numCallsPerBlock from observed on-chain consumption would be a more principled fix than relying on a static safety multiplier.
Adds optional payloadworker.BlockObserver interface. The sequencer calls OnBlockBuilt after every non-setup benchmark block with the observed gas used and user tx count. simulatorPayloadWorker implements it: after the first block, it recomputes numCallsPerBlock from actual per-tx gas (rather than the setup-time estimate from simulator.Run). Why: the setup gas estimate runs simulator.Run with the full BASE config (load_accounts=13, update_accounts=5, storage_loaded=49, ..., precompile calls). When per-tx blockCounts is scaleFactor*base, actual per-tx gas differs from the estimate by a factor scaleFactor (modulo rounding), so numCallsPerBlock=(gasLimit-buffer)/base_gas under-estimates per-block capacity by a factor of scaleFactor for scaleFactor<1. Concrete impact for base-mainnet-simulation @ 25M (scaleFactor=0.714): - estimate-based: 46 calls/block, 46x365k=16.8M of 25M (67% fill) - observed-based: 65 calls/block, 65x365k=23.7M of 25M (95% fill) For scaleFactor>1 with user-specified calls_per_block cap, recalibration also helps: gas budget breaks the send loop early at ~68 txs even though targetCalls=100, so 32 txs/block of wasted gas estimation. Recalibration drops targetCalls to the actually-achievable 68. Recalibration is one-shot (the recalibrated bool guards against subsequent invocations) and respects any user-specified CallsPerBlock upper bound. Includes unit tests for under-fill, over-target, user-cap, no-op-after-first-block, and degenerate-input cases.
Collaborator
🟡 Heimdall Review Status
|
This was referenced Jun 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Three coordinated simulator-worker fixes. Together they unblock the long-broken public-benchmarks CI and make
base-mainnet-simulationactually fill the configured gas limit.Commits
fix(simulator): OpcodeStats Add/Sub must use union of keys— Root-cause fix. The map ops iterated onlyother, silently dropping keys present only in the receiver. Effect: per-txblockCounts.Precompileswas always empty insendTxs, so worker txs skipped all precompile work.fix(simulator): bump pre-init buffer 1.05x -> 2.0x to fix CI— Safety margin. The 5% buffer was too tight at thebase-mainnet-simulation @ 25Mmatrix cell post-PR fix(simulator): remove double-application of scaleFactor in targetCalls #184; rounding accumulation plus the new (correct, post-Add/Sub) per-tx execution paths overrun it. 2.0x leaves room and accommodates the upward recalibration in commit 3.feat(simulator): recalibrate numCallsPerBlock from observed block gas— Adds an optionalpayloadworker.BlockObserverinterface so workers can refine per-tx assumptions after the first block. Improvesbase-mainnet-simulation @ 25Mfill from 67% → 95%; drops wasted work at 200M/250M where the user cap and gas budget disagree.Why all three together
* Single block math; recalibration in #3 is what closes the loop with the gas-budget breaker in
sendTxs.#3 alone would actually break CI by under-sizing pre-init relative to the recalibrated send rate; #2's buffer covers it. That's why they're combined.
Replaces
Files
Verification
go build ./...cleango vet ./...cleango test ./...clean; 11 new tests pass (6 for OpcodeStats union semantics, 5 forOnBlockBuiltrecalibration paths)Followups, out of scope
gas/per_block,transactions/per_block, andreth_base_builder_flashblock_gas_headroom_pct_*in the aggregatedSequencerKeyMetricsso the run-comparison view shows what's actually capping a run. Caught me out at length in the original investigation.Draft because end-to-end validation needs one mainnet CI cycle. Expected outcome on rerun:
mainnet-transfer-only @ 250M: 131M → ~245Mgas/per_block(DA cap was already removed in fix: disable post-Jovian DA footprint cap in benchmark #191)base-mainnet-simulation @ 250M: 51M → ~245Mgas/per_block(Precompiles in per-tx + recalibration)storage-create-full-block @ 250M: unchanged (~245M, no precompiles, no scaleFactor cap)