fix(simulator): unbreak CI and make blocks actually fill (Add/Sub union, 2.0x pre-init buffer, observed-gas recalibration) by meyer9 · Pull Request #194 · base/benchmark

meyer9 · 2026-06-01T17:40:39Z

Three coordinated simulator-worker fixes. Together they unblock the long-broken public-benchmarks CI and make base-mainnet-simulation actually fill the configured gas limit.

Commits

fix(simulator): OpcodeStats Add/Sub must use union of keys — Root-cause fix. The map ops iterated only other, silently dropping keys present only in the receiver. Effect: per-tx blockCounts.Precompiles was always empty in sendTxs, so worker txs skipped all precompile work.
fix(simulator): bump pre-init buffer 1.05x -> 2.0x to fix CI — Safety margin. The 5% buffer was too tight at the base-mainnet-simulation @ 25M matrix cell post-PR fix(simulator): remove double-application of scaleFactor in targetCalls #184; rounding accumulation plus the new (correct, post-Add/Sub) per-tx execution paths overrun it. 2.0x leaves room and accommodates the upward recalibration in commit 3.
feat(simulator): recalibrate numCallsPerBlock from observed block gas — Adds an optional payloadworker.BlockObserver interface so workers can refine per-tx assumptions after the first block. Improves base-mainnet-simulation @ 25M fill from 67% → 95%; drops wasted work at 200M/250M where the user cap and gas budget disagree.

Why all three together

Combination	25M fill	250M fill	CI passes?
Current (none)	~20%	~20%	❌ (since 2026-05-15)
#1 only (Add/Sub fix)	~67%	~98%*	❌ — 5% buffer still too tight
#1 + #2	~67%	~98%	✅
#1 + #2 + #3	~95%	~98%	✅

* Single block math; recalibration in #3 is what closes the loop with the gas-budget breaker in sendTxs.

#3 alone would actually break CI by under-sizing pre-init relative to the recalibrated send rate; #2's buffer covers it. That's why they're combined.

Replaces

Closes fix(simulator): bump pre-init buffer 1.05x -> 2.0x to fix CI #192 (buffer-bump only)
Closes fix(simulator): OpcodeStats Add/Sub must use union of keys #193 (OpcodeStats only)

Files

 runner/network/sequencer_benchmark.go              |  6 ++
 runner/payload/simulator/simulatorstats/types.go   | 14 ++--
 runner/payload/simulator/simulatorstats/types_test.go | 76 ++++++++++++++++++
 runner/payload/simulator/worker.go                 | 36 +++++++-
 runner/payload/simulator/worker_test.go            | 70 ++++++++++++++++
 runner/payload/worker/types.go                     |  8 +++
 6 files changed, 205 insertions(+), 5 deletions(-)

Verification

go build ./... clean
go vet ./... clean
go test ./... clean; 11 new tests pass (6 for OpcodeStats union semantics, 5 for OnBlockBuilt recalibration paths)

Followups, out of scope

Surface gas/per_block, transactions/per_block, and reth_base_builder_flashblock_gas_headroom_pct_* in the aggregated SequencerKeyMetrics so the run-comparison view shows what's actually capping a run. Caught me out at length in the original investigation.

Draft because end-to-end validation needs one mainnet CI cycle. Expected outcome on rerun:

mainnet-transfer-only @ 250M: 131M → ~245M gas/per_block (DA cap was already removed in fix: disable post-Jovian DA footprint cap in benchmark #191)
base-mainnet-simulation @ 250M: 51M → ~245M gas/per_block (Precompiles in per-tx + recalibration)
storage-create-full-block @ 250M: unchanged (~245M, no precompiles, no scaleFactor cap)

Before this change, both OpcodeStats.Add and OpcodeStats.Sub iterated only the 'other' argument: for opcode, count := range other { result[opcode] = o[opcode] + count } This silently dropped any key present in the receiver 'o' but absent from 'other'. For Add(empty) the result was an empty map, losing 'o' entirely. Concrete consequence in sendTxs: per-tx blockCounts is computed as blockCounts = expected.Sub(actual).Round() where 'expected' is base x Mul(numCalls+1 x scaleFactor) (carrying all base opcodes + precompiles) and 'actual' starts empty for the first tx. With the buggy Sub iterating actual=empty, blockCounts.Opcodes and blockCounts.Precompiles were ALWAYS empty for every tx in every block. The downstream effect: workloads with precompile counts in their config (e.g. base-mainnet-simulation: bls12381MapG2, ecrecover, bls12381G1Add, bls12381G1MultiExp) had per-tx execution that skipped all precompile work. Each tx therefore consumed far less gas than the gas estimate predicted, and blocks under-filled the gas budget. Example: base-mainnet-simulation @ GasLimit=250M (block_time=1s) reported gas/per_block ~51M (20% of limit) in production runs because per-tx skipped the heaviest workload component. Fix: both Add and Sub now iterate the union of keys. For Sub, a key present only in 'other' produces a negative result entry (matches arithmetic semantics). Includes unit tests for the union semantics and a regression test for the first-tx scenario in sendTxs.

The basic-benchmarks job in public-benchmarks.yaml has been failing on every push to main since cf47296 (PR #184, May 15) with: execution reverted: Not enough accounts to load/update Reproduces consistently for base-mainnet-simulation @ 25M gas (the first cell of the matrix, run on both reth and geth). Fails roughly 7-8 benchmark blocks into a 20-block run. Root mechanism: PR #184 changed targetCalls from ceil(numCallsPerBlock * scaleFactor) // pre-fix to numCallsPerBlock // post-fix to address an overshoot for scaleFactor > 1 (e.g. 200M gas case). For scaleFactor < 1 (small gas limits like 25M), this raised on-chain consumption within the existing 5% pre-init buffer. CI's base-mainnet-simulation @ 25M sits right at the boundary where rounding accumulation + per-field interaction overruns the 5% margin and trips the contract's current_address_index + load + update <= num_address_initialized require well before block 20. Bumping the multiplier to 2.0x removes the boundary entirely. Pre-init runs during setup (one-time cost, dominated by mineAndConfirm batching), so this adds maybe a few seconds of setup wall time per test cell — negligible vs the ~5 minutes per cell + 1h total job time. Out of scope for this PR but noted for follow-up: - OpcodeStats.Sub/Add iterate 'other' rather than the union of s and other, which causes per-tx blockCounts.Precompiles to always be empty in sendTxs. Result: per-tx actual gas is much lower than gas_per_call estimate, so the benchmark under-utilizes the gas budget. Not the cause of THIS CI failure, but should be fixed. - The contract require is a benchmark-fidelity check, not a protocol constraint; making the worker recalibrate numCallsPerBlock from observed on-chain consumption would be a more principled fix than relying on a static safety multiplier.

Adds optional payloadworker.BlockObserver interface. The sequencer calls OnBlockBuilt after every non-setup benchmark block with the observed gas used and user tx count. simulatorPayloadWorker implements it: after the first block, it recomputes numCallsPerBlock from actual per-tx gas (rather than the setup-time estimate from simulator.Run). Why: the setup gas estimate runs simulator.Run with the full BASE config (load_accounts=13, update_accounts=5, storage_loaded=49, ..., precompile calls). When per-tx blockCounts is scaleFactor*base, actual per-tx gas differs from the estimate by a factor scaleFactor (modulo rounding), so numCallsPerBlock=(gasLimit-buffer)/base_gas under-estimates per-block capacity by a factor of scaleFactor for scaleFactor<1. Concrete impact for base-mainnet-simulation @ 25M (scaleFactor=0.714): - estimate-based: 46 calls/block, 46x365k=16.8M of 25M (67% fill) - observed-based: 65 calls/block, 65x365k=23.7M of 25M (95% fill) For scaleFactor>1 with user-specified calls_per_block cap, recalibration also helps: gas budget breaks the send loop early at ~68 txs even though targetCalls=100, so 32 txs/block of wasted gas estimation. Recalibration drops targetCalls to the actually-achievable 68. Recalibration is one-shot (the recalibrated bool guards against subsequent invocations) and respects any user-specified CallsPerBlock upper bound. Includes unit tests for under-fill, over-target, user-cap, no-op-after-first-block, and degenerate-input cases.

cb-heimdall · 2026-06-01T17:40:44Z

🟡 Heimdall Review Status

Requirement Status More Info

Reviews

🟡 0/1

Denominator calculation

Show calculation

1 if user is bot 0

1 if user is external 0

2 if repo is sensitive 0

From .codeflow.yml 1

Additional review requirements

Show calculation

Max	0

0

From CODEOWNERS 0

Global minimum 0

Max 1

1

1 if commit is unverified 0

Sum 1

meyer9 added 3 commits June 1, 2026 10:38

This was referenced Jun 1, 2026

fix(simulator): bump pre-init buffer 1.05x -> 2.0x to fix CI #192

Closed

fix(simulator): OpcodeStats Add/Sub must use union of keys #193

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(simulator): unbreak CI and make blocks actually fill (Add/Sub union, 2.0x pre-init buffer, observed-gas recalibration)#194

fix(simulator): unbreak CI and make blocks actually fill (Add/Sub union, 2.0x pre-init buffer, observed-gas recalibration)#194
meyer9 wants to merge 3 commits into
mainfrom
meyer9/simulator-fixes

meyer9 commented Jun 1, 2026

Uh oh!

cb-heimdall commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

meyer9 commented Jun 1, 2026

Commits

Why all three together

Replaces

Files

Verification

Followups, out of scope

Uh oh!

cb-heimdall commented Jun 1, 2026

🟡 Heimdall Review Status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants