Skip to content

fix(simulator): unbreak CI and make blocks actually fill (Add/Sub union, 2.0x pre-init buffer, observed-gas recalibration)#194

Draft
meyer9 wants to merge 3 commits into
mainfrom
meyer9/simulator-fixes
Draft

fix(simulator): unbreak CI and make blocks actually fill (Add/Sub union, 2.0x pre-init buffer, observed-gas recalibration)#194
meyer9 wants to merge 3 commits into
mainfrom
meyer9/simulator-fixes

Conversation

@meyer9
Copy link
Copy Markdown
Collaborator

@meyer9 meyer9 commented Jun 1, 2026

Three coordinated simulator-worker fixes. Together they unblock the long-broken public-benchmarks CI and make base-mainnet-simulation actually fill the configured gas limit.

Commits

  1. fix(simulator): OpcodeStats Add/Sub must use union of keys — Root-cause fix. The map ops iterated only other, silently dropping keys present only in the receiver. Effect: per-tx blockCounts.Precompiles was always empty in sendTxs, so worker txs skipped all precompile work.
  2. fix(simulator): bump pre-init buffer 1.05x -> 2.0x to fix CI — Safety margin. The 5% buffer was too tight at the base-mainnet-simulation @ 25M matrix cell post-PR fix(simulator): remove double-application of scaleFactor in targetCalls #184; rounding accumulation plus the new (correct, post-Add/Sub) per-tx execution paths overrun it. 2.0x leaves room and accommodates the upward recalibration in commit 3.
  3. feat(simulator): recalibrate numCallsPerBlock from observed block gas — Adds an optional payloadworker.BlockObserver interface so workers can refine per-tx assumptions after the first block. Improves base-mainnet-simulation @ 25M fill from 67% → 95%; drops wasted work at 200M/250M where the user cap and gas budget disagree.

Why all three together

Combination 25M fill 250M fill CI passes?
Current (none) ~20% ~20% ❌ (since 2026-05-15)
#1 only (Add/Sub fix) ~67% ~98%* ❌ — 5% buffer still too tight
#1 + #2 ~67% ~98%
#1 + #2 + #3 ~95% ~98%

* Single block math; recalibration in #3 is what closes the loop with the gas-budget breaker in sendTxs.

#3 alone would actually break CI by under-sizing pre-init relative to the recalibrated send rate; #2's buffer covers it. That's why they're combined.

Replaces

Files

 runner/network/sequencer_benchmark.go              |  6 ++
 runner/payload/simulator/simulatorstats/types.go   | 14 ++--
 runner/payload/simulator/simulatorstats/types_test.go | 76 ++++++++++++++++++
 runner/payload/simulator/worker.go                 | 36 +++++++-
 runner/payload/simulator/worker_test.go            | 70 ++++++++++++++++
 runner/payload/worker/types.go                     |  8 +++
 6 files changed, 205 insertions(+), 5 deletions(-)

Verification

  • go build ./... clean
  • go vet ./... clean
  • go test ./... clean; 11 new tests pass (6 for OpcodeStats union semantics, 5 for OnBlockBuilt recalibration paths)

Followups, out of scope

  • Surface gas/per_block, transactions/per_block, and reth_base_builder_flashblock_gas_headroom_pct_* in the aggregated SequencerKeyMetrics so the run-comparison view shows what's actually capping a run. Caught me out at length in the original investigation.

Draft because end-to-end validation needs one mainnet CI cycle. Expected outcome on rerun:

  • mainnet-transfer-only @ 250M: 131M → ~245M gas/per_block (DA cap was already removed in fix: disable post-Jovian DA footprint cap in benchmark #191)
  • base-mainnet-simulation @ 250M: 51M → ~245M gas/per_block (Precompiles in per-tx + recalibration)
  • storage-create-full-block @ 250M: unchanged (~245M, no precompiles, no scaleFactor cap)

meyer9 added 3 commits June 1, 2026 10:38
Before this change, both OpcodeStats.Add and OpcodeStats.Sub iterated only
the 'other' argument:

  for opcode, count := range other {
      result[opcode] = o[opcode] + count
  }

This silently dropped any key present in the receiver 'o' but absent from
'other'. For Add(empty) the result was an empty map, losing 'o' entirely.

Concrete consequence in sendTxs: per-tx blockCounts is computed as
  blockCounts = expected.Sub(actual).Round()
where 'expected' is base x Mul(numCalls+1 x scaleFactor) (carrying all base
opcodes + precompiles) and 'actual' starts empty for the first tx. With the
buggy Sub iterating actual=empty, blockCounts.Opcodes and
blockCounts.Precompiles were ALWAYS empty for every tx in every block.

The downstream effect: workloads with precompile counts in their config
(e.g. base-mainnet-simulation: bls12381MapG2, ecrecover, bls12381G1Add,
bls12381G1MultiExp) had per-tx execution that skipped all precompile work.
Each tx therefore consumed far less gas than the gas estimate predicted,
and blocks under-filled the gas budget. Example: base-mainnet-simulation @
GasLimit=250M (block_time=1s) reported gas/per_block ~51M (20% of limit)
in production runs because per-tx skipped the heaviest workload component.

Fix: both Add and Sub now iterate the union of keys. For Sub, a key present
only in 'other' produces a negative result entry (matches arithmetic
semantics).

Includes unit tests for the union semantics and a regression test for the
first-tx scenario in sendTxs.
The basic-benchmarks job in public-benchmarks.yaml has been failing on every
push to main since cf47296 (PR #184, May 15) with:
  execution reverted: Not enough accounts to load/update

Reproduces consistently for base-mainnet-simulation @ 25M gas (the first cell
of the matrix, run on both reth and geth). Fails roughly 7-8 benchmark blocks
into a 20-block run.

Root mechanism: PR #184 changed targetCalls from
  ceil(numCallsPerBlock * scaleFactor)  // pre-fix
to
  numCallsPerBlock                       // post-fix
to address an overshoot for scaleFactor > 1 (e.g. 200M gas case). For
scaleFactor < 1 (small gas limits like 25M), this raised on-chain consumption
within the existing 5% pre-init buffer. CI's base-mainnet-simulation @ 25M
sits right at the boundary where rounding accumulation + per-field interaction
overruns the 5% margin and trips the contract's
  current_address_index + load + update <= num_address_initialized
require well before block 20.

Bumping the multiplier to 2.0x removes the boundary entirely. Pre-init runs
during setup (one-time cost, dominated by mineAndConfirm batching), so this
adds maybe a few seconds of setup wall time per test cell — negligible vs the
~5 minutes per cell + 1h total job time.

Out of scope for this PR but noted for follow-up:
- OpcodeStats.Sub/Add iterate 'other' rather than the union of s and other,
  which causes per-tx blockCounts.Precompiles to always be empty in sendTxs.
  Result: per-tx actual gas is much lower than gas_per_call estimate, so the
  benchmark under-utilizes the gas budget. Not the cause of THIS CI failure,
  but should be fixed.
- The contract require is a benchmark-fidelity check, not a protocol
  constraint; making the worker recalibrate numCallsPerBlock from observed
  on-chain consumption would be a more principled fix than relying on a
  static safety multiplier.
Adds optional payloadworker.BlockObserver interface. The sequencer calls
OnBlockBuilt after every non-setup benchmark block with the observed gas
used and user tx count. simulatorPayloadWorker implements it: after the
first block, it recomputes numCallsPerBlock from actual per-tx gas
(rather than the setup-time estimate from simulator.Run).

Why: the setup gas estimate runs simulator.Run with the full BASE config
(load_accounts=13, update_accounts=5, storage_loaded=49, ..., precompile
calls). When per-tx blockCounts is scaleFactor*base, actual per-tx gas
differs from the estimate by a factor scaleFactor (modulo rounding), so
numCallsPerBlock=(gasLimit-buffer)/base_gas under-estimates per-block
capacity by a factor of scaleFactor for scaleFactor<1.

Concrete impact for base-mainnet-simulation @ 25M (scaleFactor=0.714):
- estimate-based: 46 calls/block, 46x365k=16.8M of 25M (67% fill)
- observed-based: 65 calls/block, 65x365k=23.7M of 25M (95% fill)

For scaleFactor>1 with user-specified calls_per_block cap, recalibration
also helps: gas budget breaks the send loop early at ~68 txs even though
targetCalls=100, so 32 txs/block of wasted gas estimation. Recalibration
drops targetCalls to the actually-achievable 68.

Recalibration is one-shot (the recalibrated bool guards against
subsequent invocations) and respects any user-specified CallsPerBlock
upper bound. Includes unit tests for under-fill, over-target, user-cap,
no-op-after-first-block, and degenerate-input cases.
@cb-heimdall
Copy link
Copy Markdown
Collaborator

🟡 Heimdall Review Status

Requirement Status More Info
Reviews 🟡 0/1
Denominator calculation
Show calculation
1 if user is bot 0
1 if user is external 0
2 if repo is sensitive 0
From .codeflow.yml 1
Additional review requirements
Show calculation
Max 0
0
From CODEOWNERS 0
Global minimum 0
Max 1
1
1 if commit is unverified 0
Sum 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants