Conversation
…es holes Finalize cleanup deleted only a single batch index, so when the finalize target jumped past batches finalized by other submitters, the persisted sealed batch indices snapshot was left with holes. On restart, LoadAllSealedBatchesAndHeader assumed contiguous indices and dereferenced a nil parent batch (SIGSEGV at batch_storage.go:146), crash-looping the service. - load path: sort indices, verify contiguity and nil-check the parent batch, returning an error so the existing self-heal path rebuilds from the rollup contract instead of panicking - finalize cleanup: replace single-index Delete with range-based DeleteUntil so surviving indices always form a contiguous window (also reclaims previously leaked header keys) - storage: add atomic WriteBatch to SealedBatchKV and persist batch data + header + indices in one write; indices update errors are no longer swallowed Co-authored-by: Cursor <cursoragent@cursor.com>
|
Warning Review limit reached
More reviews will be available in 41 minutes and 11 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR makes sealed-batch storage writes atomic, validates contiguous batch loading, adds range-based cleanup, and serializes batch-cache initialization and recovery. Finalize cleanup now deletes batch ranges, and regression tests cover storage failures and rollback paths. ChangesAtomic Batch Storage Consistency and Startup Resilience
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
common/batch/batch_storage_test.go (4)
128-131: ⚡ Quick winUse
isKVNotFoundfor consistent error validation.Lines 128-131 verify that deleted batches return errors, but don't check the error type. Line 127 correctly uses
isKVNotFoundto verify the indices key was deleted. Applying the same pattern here ensures the correct "not found" error is returned rather than a different error type.✨ Consistent error type checking
_, err = s.LoadSealedBatch(100) - require.Error(t, err) + require.True(t, isKVNotFound(err), "batch should be deleted") _, err = s.LoadSealedBatchHeader(100) - require.Error(t, err) + require.True(t, isKVNotFound(err), "header should be deleted")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@common/batch/batch_storage_test.go` around lines 128 - 131, Replace the generic require.Error checks with assertions that the error is the expected KV-not-found using isKVNotFound; specifically, after calling s.LoadSealedBatch(100) and s.LoadSealedBatchHeader(100) assert isKVNotFound(err) (e.g., require.Truef(t, isKVNotFound(err), "expected KV not found, got: %v", err)) so the test verifies the correct not-found error type rather than any error.
70-104: ⚡ Quick winConsider adding edge case tests for boundary and over-deletion scenarios.
The current test verifies core DeleteSealedBatchesUpTo behavior but doesn't cover important boundary cases:
- Deleting exactly at the highest index (e.g.,
DeleteSealedBatchesUpTo(105)when indices are[104, 105]) to verify all batches are removed- Deleting well beyond the highest index (e.g.,
DeleteSealedBatchesUpTo(200)when max is 105) to ensure it handles over-deletion gracefullyThese cases would strengthen confidence that the deletion logic handles edge conditions correctly.
📋 Example edge case tests
// After line 103, add: // Delete exactly at the boundary. s2 := NewBatchStorage(openTestKV(t)) storeTestChain(t, s2, []uint64{100, 101}) require.NoError(t, s2.DeleteSealedBatchesUpTo(101)) indices, err := s2.loadBatchIndices() require.True(t, isKVNotFound(err), "all indices should be deleted") // Delete beyond the window. s3 := NewBatchStorage(openTestKV(t)) storeTestChain(t, s3, []uint64{100, 101}) require.NoError(t, s3.DeleteSealedBatchesUpTo(200)) indices, err = s3.loadBatchIndices() require.True(t, isKVNotFound(err), "all indices should be deleted")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@common/batch/batch_storage_test.go` around lines 70 - 104, Add two edge-case checks to TestDeleteSealedBatchesUpTo: after the existing assertions, instantiate new BatchStorage instances (e.g., s2 and s3), store a small chain with indices like [100,101], then call DeleteSealedBatchesUpTo(101) and DeleteSealedBatchesUpTo(200) respectively and assert the indices are gone by calling loadBatchIndices and verifying it returns the KV-not-found condition (use isKVNotFound(err) or equivalent). Ensure you reference the same methods DeleteSealedBatchesUpTo and loadBatchIndices so the tests target the exact deletion behavior at the boundary and when over-deleting.
152-168: ⚡ Quick winConsider testing error propagation for range deletion methods.
The test validates that
StoreSealedBatch,StoreSealedBatchAndHeader, andDeleteSealedBatchpropagate indices read errors. Since the PR objectives mention "index update errors are propagated" andDeleteSealedBatchesUpToandDeleteAllSealedBatchesalso update indices, consider testing those methods as well to ensure complete coverage of the error propagation guarantee.🧪 Additional error propagation checks
// After line 167, add: err = s.DeleteSealedBatchesUpTo(100) require.ErrorIs(t, err, errIndicesUnavailable) err = s.DeleteAllSealedBatches() require.ErrorIs(t, err, errIndicesUnavailable)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@common/batch/batch_storage_test.go` around lines 152 - 168, The test TestStoreSealedBatchPropagatesIndicesError currently checks StoreSealedBatch, StoreSealedBatchAndHeader, and DeleteSealedBatch but misses range-deletion index errors; add additional assertions using the same storage fixture (s := NewBatchStorage(&failingIndicesKV{SealedBatchKV: openTestKV(t)})) to call DeleteSealedBatchesUpTo(100) and DeleteAllSealedBatches() and assert require.ErrorIs(err, errIndicesUnavailable) for each, placing them after the existing DeleteSealedBatch assertion so the test verifies index-update errors are propagated for those range-deletion methods as well.
80-85: ⚡ Quick winUse
isKVNotFoundfor consistent error validation.Lines 81-84 verify that deleted batches return errors, but don't check that the error type is "not found". Line 127 and the similar pattern in TestDeleteAllSealedBatches (line 127) use
isKVNotFoundto verify the specific error type. This ensures the right kind of error is returned and could catch bugs where a different error (e.g., a corruption error) is returned instead.✨ Consistent error type checking
for idx := uint64(100); idx <= 103; idx++ { _, err := s.LoadSealedBatch(idx) - require.Error(t, err, "batch %d should be deleted", idx) + require.True(t, isKVNotFound(err), "batch %d should be deleted", idx) _, err = s.LoadSealedBatchHeader(idx) - require.Error(t, err, "header %d should be deleted", idx) + require.True(t, isKVNotFound(err), "header %d should be deleted", idx) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@common/batch/batch_storage_test.go` around lines 80 - 85, Replace generic require.Error checks with a specific "not found" assertion using isKVNotFound: when calling s.LoadSealedBatch and s.LoadSealedBatchHeader in the loop (functions LoadSealedBatch and LoadSealedBatchHeader), assert that isKVNotFound(err) is true (e.g., via require.True or require.Truef) so the test verifies the error type rather than any error; keep the same loop and messages but use isKVNotFound(err) for validation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@common/batch/batch_cache.go`:
- Around line 755-760: SealBatch currently swallows errors from
bc.batchStorage.StoreSealedBatchAndHeader and logs them instead of returning
failure; change SealBatch so that after calling
batchStorage.StoreSealedBatchAndHeader(batchIndex, sealedBatch,
&batchHeaderCopy) it returns the encountered error (or a wrapped error with
context) instead of only logging, ensuring the SealBatch function signature and
all callers propagate/handle the error; reference
bc.batchStorage.StoreSealedBatchAndHeader and the SealBatch function when making
the change so persistence failures are propagated to callers.
---
Nitpick comments:
In `@common/batch/batch_storage_test.go`:
- Around line 128-131: Replace the generic require.Error checks with assertions
that the error is the expected KV-not-found using isKVNotFound; specifically,
after calling s.LoadSealedBatch(100) and s.LoadSealedBatchHeader(100) assert
isKVNotFound(err) (e.g., require.Truef(t, isKVNotFound(err), "expected KV not
found, got: %v", err)) so the test verifies the correct not-found error type
rather than any error.
- Around line 70-104: Add two edge-case checks to TestDeleteSealedBatchesUpTo:
after the existing assertions, instantiate new BatchStorage instances (e.g., s2
and s3), store a small chain with indices like [100,101], then call
DeleteSealedBatchesUpTo(101) and DeleteSealedBatchesUpTo(200) respectively and
assert the indices are gone by calling loadBatchIndices and verifying it returns
the KV-not-found condition (use isKVNotFound(err) or equivalent). Ensure you
reference the same methods DeleteSealedBatchesUpTo and loadBatchIndices so the
tests target the exact deletion behavior at the boundary and when over-deleting.
- Around line 152-168: The test TestStoreSealedBatchPropagatesIndicesError
currently checks StoreSealedBatch, StoreSealedBatchAndHeader, and
DeleteSealedBatch but misses range-deletion index errors; add additional
assertions using the same storage fixture (s :=
NewBatchStorage(&failingIndicesKV{SealedBatchKV: openTestKV(t)})) to call
DeleteSealedBatchesUpTo(100) and DeleteAllSealedBatches() and assert
require.ErrorIs(err, errIndicesUnavailable) for each, placing them after the
existing DeleteSealedBatch assertion so the test verifies index-update errors
are propagated for those range-deletion methods as well.
- Around line 80-85: Replace generic require.Error checks with a specific "not
found" assertion using isKVNotFound: when calling s.LoadSealedBatch and
s.LoadSealedBatchHeader in the loop (functions LoadSealedBatch and
LoadSealedBatchHeader), assert that isKVNotFound(err) is true (e.g., via
require.True or require.Truef) so the test verifies the error type rather than
any error; keep the same loop and messages but use isKVNotFound(err) for
validation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 717da052-ebfd-4005-845e-6788be45eabe
📒 Files selected for processing (7)
common/batch/batch_cache.gocommon/batch/batch_storage.gocommon/batch/batch_storage_test.gocommon/batch/helpers_test.gocommon/batch/interfaces.gotx-submitter/db/db.gotx-submitter/services/rollup.go
Propagate sealed batch storage failures and make cleanup DB-first so restart recovery cannot leave cache and storage diverged. Co-authored-by: Cursor <cursoragent@cursor.com>
…delete safety Address the P0 follow-ups from the issue #975 / PR #991 review: - 5.1 self-heal force-wipe: add SealedBatchKV.IteratePrefixKeys and BatchStorage.ForceDeleteAllSealedBatches (prefix scan over the shared "sealed_batch_" prefix), and fall back to it in self-heal when the normal DeleteAllSealedBatches fails. A corrupt sealed_batch_indices no longer stalls startup forever; the node can wipe and rebuild from rollup. - 5.3 cleanup observability: add tx_submitter_batch_cleanup_failures_total counter, incremented when DeleteUntil fails after a confirmed finalize tx. - 5.6 delete safety: DeleteSealedBatch now rejects interior single-index deletes (only the window boundary is allowed) so it can no longer punch holes; single-point APIs documented as not for finalize cleanup. Tests: TestForceDeleteAllSealedBatchesWipesCorruptIndices, TestDeleteSealedBatchRejectsInteriorIndex. The existing hole regression test now injects the gap at the KV layer since DeleteSealedBatch refuses it. build / vet / gofmt / race clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tx-submitter/db/db.go`:
- Line 121: The iterator creation in db.go is passing a raw prefix to
`d.db.NewIterator`, which should use a LevelDB range wrapper instead. Update the
`NewIterator` call in the relevant recovery path to pass
`leveldbutil.BytesPrefix(prefix)` so it only scans the intended key range, and
make sure the `leveldb/util` import is present and referenced correctly in the
`db` package.
In `@tx-submitter/metrics/metrics.go`:
- Line 118: The new batch cleanup failure collector is registered in
Metrics.initMetrics but never removed in UnregisterMetrics, so re-initializing
can leave the replacement Metrics instance pointing at an unexported counter.
Update UnregisterMetrics to unregister m.batchCleanupFailures alongside the
other collectors, and ensure the Metrics lifecycle around initMetrics/Register
and UnregisterMetrics stays symmetric so a fresh Metrics instance can register
cleanly after re-init.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f65b5de4-fc97-4d9e-a340-b4c8c15db6bd
📒 Files selected for processing (8)
common/batch/batch_cache.gocommon/batch/batch_storage.gocommon/batch/batch_storage_test.gocommon/batch/helpers_test.gocommon/batch/interfaces.gotx-submitter/db/db.gotx-submitter/metrics/metrics.gotx-submitter/services/rollup.go
🚧 Files skipped from review as they are similar to previous changes (6)
- common/batch/helpers_test.go
- common/batch/interfaces.go
- tx-submitter/services/rollup.go
- common/batch/batch_storage_test.go
- common/batch/batch_cache.go
- common/batch/batch_storage.go
Address CodeRabbit minor finding: the new batchCleanupFailures counter was registered but never unregistered, so an in-process re-init would hit AlreadyRegisteredError (silently ignored) and the replacement Metrics instance would increment a counter that is not exported. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Closes #975
Summary
tx-submittercrash-looped on restart with a nil pointer dereference (SIGSEGV addr=0x8) atcommon/batch/batch_storage.go:146. Root cause: finalize cleanup deleted only a single batch index (Delete(batchIndex-1)), so when the finalize target jumped past batches finalized by other submitters, the persistedsealed_batch_indicessnapshot was left with middle holes.LoadAllSealedBatchesAndHeaderassumed contiguous indices (batches[idx-1]) and dereferenced a nil parent batch. The hole stays invisible at runtime (queries are memory-first) and only explodes on the next restart.Changes
loadBatchIndicesnow sorts ascending;LoadAllSealedBatchesAndHeaderverifies index contiguity and nil-checks the parent batch, returning an error so the existingDeleteBatchStorageAndInitFromRollupself-heal path rebuilds from the rollup contract instead of panicking.BatchCache.DeleteUntil/BatchStorage.DeleteSealedBatchesUpToreplaces the single-index delete inhandleConfirmedTx, so surviving indices always form a contiguous window. Also reclaims header keys that the old path leaked.SealedBatchKVgains an atomicWriteBatch; batch data + header + indices snapshot are now persisted in one atomic write (StoreSealedBatchAndHeader), and indices update errors are no longer swallowed.Compatibility
Nodes that already have a holed indices snapshot will hit the contiguity check on first restart after upgrade and automatically self-heal by rebuilding sealed batches from the rollup contract; no manual intervention needed.
Test plan
common/batch/batch_storage_test.go:DeleteSealedBatchesUpToremoves data + header keys and keeps the surviving window loadablecd common && go test ./batch/(unit tests) passingcd tx-submitter && go build ./... && go test ./db/... ./utils/... ./services/...passingMade with Cursor
Summary by CodeRabbit
BatchCache.DeleteUntil(maxIndex).BatchStorage.DeleteSealedBatchesUpTo(maxIndex)with contiguous-window rewriting.