Skip to content

Fix replicas-check counters being reset per ledger range#4822

Open
SongOf wants to merge 1 commit into
apache:masterfrom
SongOf:fix_replicas_metric
Open

Fix replicas-check counters being reset per ledger range#4822
SongOf wants to merge 1 commit into
apache:masterfrom
SongOf:fix_replicas_metric

Conversation

@SongOf

@SongOf SongOf commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Descriptions of the changes in this PR:

Motivation

AuditorReplicasCheckTask.replicasCheck() scans all ledger metadata in LedgerRange
chunks. At the top of every range iteration it reset the three
numLedgersFoundHaving{NoReplica,LessThanAQ,LessThanWQ}OfAnEntry counters to 0, but
runTask() reads them and publishes the gauges only once, after the whole run completes.

As a result, a finding in an earlier range was wiped by any later range, so the
NUM_LEDGERS_HAVING_NO_REPLICA_OF_AN_ENTRY /
NUM_LEDGERS_HAVING_LESS_THAN_AQ_REPLICAS_OF_AN_ENTRY /
NUM_LEDGERS_HAVING_LESS_THAN_WQ_REPLICAS_OF_AN_ENTRY gauges reflected only the last
ledger range and severely undercounted ledgers with missing / under-replicated entries
whenever ledgers span more than one range. Monitoring and alerting built on these gauges
is therefore misleading.

Changes

  • Move the three counter resets to once before the range loop so they accumulate across
    all ledger ranges. The ledgersWithMissingEntries / ledgersWithUnavailableBookies maps
    remain per-range scratch state (still cleared each iteration and reported at the end of
    that iteration).
  • Add AuditorReplicasCheckTaskCrossRangeTest, a focused Mockito unit test (no cluster,
    fully deterministic): two ledger ranges, an unhealthy ledger below write-quorum in the
    first range and a healthy ledger in the second. It fails on the unpatched code
    (gauge 0 instead of 1) and passes after the fix.

This is an internal metric-accounting fix; it does not change any public API, schema, wire
protocol, REST endpoint, CLI option, or configuration.


In order to uphold a high standard for quality for code contributions, Apache BookKeeper runs various precommit
checks for pull requests. A pull request can only be merged when it passes precommit checks.


Be sure to do all the following to help us incorporate your contribution
quickly and easily:

If this PR is a BookKeeper Proposal (BP):

  • Make sure the PR title is formatted like:
    <BP-#>: Description of bookkeeper proposal
    e.g. BP-1: 64 bits ledger is support
  • Attach the master issue link in the description of this PR.
  • Attach the google doc link if the BP is written in Google Doc.

Otherwise:

  • Make sure the PR title is formatted like:
    <Issue #>: Description of pull request
    e.g. Issue 123: Description ...
  • Make sure tests pass via mvn clean apache-rat:check install spotbugs:check.
  • Replace <Issue #> in the title with the actual Issue number.

@SongOf SongOf force-pushed the fix_replicas_metric branch from 9f21b5c to f73b696 Compare June 22, 2026 15:23
@zymap zymap requested review from hangc0276 and merlimat June 23, 2026 05:13
@zymap zymap added this to the 4.19.0 milestone Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants