Skip to content

Add invariants check#7902

Open
cjen1-msft wants to merge 4 commits into
microsoft:mainfrom
cjen1-msft:ledger-invariants-update
Open

Add invariants check#7902
cjen1-msft wants to merge 4 commits into
microsoft:mainfrom
cjen1-msft:ledger-invariants-update

Conversation

@cjen1-msft
Copy link
Copy Markdown
Contributor

This updates the pre-existing invariants check to more closely match CCF's behavior.
For ledger files this is three properties:

  • A node's ledger history will be contiguous from its startup seqno onwards
  • If two committed chunks start at the same point in the ledger they are identical
  • Across the network, there is a single history of committed chunks

For snapshots it is just the chunk boundary property.

@cjen1-msft cjen1-msft added the run-long-test Run Long Test job label May 19, 2026
@cjen1-msft cjen1-msft marked this pull request as ready for review May 19, 2026 19:10
@cjen1-msft cjen1-msft requested a review from a team as a code owner May 19, 2026 19:10
Copilot AI review requested due to automatic review settings May 19, 2026 19:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Updates the test framework’s shutdown-time ledger/snapshot verification to enforce stronger, CCF-like invariants (rather than simple file equality), and threads a new check_file_invariants toggle through network lifecycle helpers.

Changes:

  • Replace “ledger files identical” verification with explicit ledger-history and snapshot-boundary invariant checks.
  • Introduce ccf.ledger.get_range_from_file() to derive an end seqno for chunks whose filename range is open-ended.
  • Update multiple tests to enable/disable invariant checking depending on scenario (recovery, compatibility, intentional divergence).

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/recovery.py Updates recovery tests to use check_file_invariants and adjusts behavior in mid-recovery/retirement cases.
tests/partitions_test.py Switches shutdown validation to invariants-based checking.
tests/lts_compatibility.py Replaces accept_ledger_diff with check_file_invariants based on snapshot usage.
tests/infra/remote.py Refactors ledger path helpers (current vs read-only) to support invariant scanning.
tests/infra/network.py Implements snapshot + ledger invariant checks and wires them into stop_all_nodes() and the network() context manager.
tests/e2e_operations.py Disables invariant checks for scenarios that intentionally manipulate ledger files.
python/src/ccf/ledger.py Adds file-based range derivation for ledger chunks with open-ended filename ranges.

Comment thread tests/infra/network.py
Comment thread tests/recovery.py
Comment thread tests/infra/network.py
Comment thread tests/infra/remote.py
Comment on lines +768 to +773
def ledger_paths(self):
return [
path
for path in [self.current_ledger_path(), *self.read_only_ledger_paths()]
if os.path.exists(path)
]
Comment thread tests/infra/network.py
Comment thread python/src/ccf/ledger.py

def get_range_from_file(ledger_file_path: str) -> tuple[int, int | None]:
start, end = range_from_filename(ledger_file_path)
if end is not None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be sensible to still proceed reading the offset and assert naming matches the contents? Or maybe we do it elsewhere already?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an efficiency tradeoff here.
I want to avoid reading the contents of the file if possible, and have it such that for most files we just use the name.
The only case is where we have:

ledger_3_5
ledger_6 //ends at 10
ledger_11

In this case we'll examine the contents of only ledger_6 and ledger_11 which I think is a good performance improvements.

Comment thread tests/infra/network.py
skip_verification=False,
verbose_verification=False,
accept_ledger_diff=False,
check_file_invariants=True,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. Shall we invert to skip_ledger_invariants=False? For consistency with other args, as in "skip_some_checks=False" by default

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a personal preference towards positive statements rather than negative ones. But that is only a personal preference rather than anything concrete.

Comment thread tests/infra/network.py

# List ledger chunks, plus checksum, in order of starting seqno
def node_ledger_files(node, allow_uncommitted):
def pred(f, allow_uncommitted, allow_recovery):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to put allow_recovery as arg here? It's a const arg of the top level func, I'd think it's accessible from anyway?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for allow_uncommitted?.. Or I'm missing smth obvious here maybe

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep there are a couple layers of shadowing here.
For a node's ledger files the node cares about only .commtted files in the read-only directory, and everything in the current files.
So the invariant only pulls the .committed files from the read-only directories and everything (modulo the global options) in the current dir.

The shadowing helps to achieve this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-long-test Run Long Test job

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants