Skip to content

PlaceholderTable/SparseTable: add transient SQLite error resilience via GVFSTable base class#2031

Open
tyrielv wants to merge 1 commit into
microsoft:masterfrom
tyrielv:tyrielv/fix-placeholder-io-tolerance
Open

PlaceholderTable/SparseTable: add transient SQLite error resilience via GVFSTable base class#2031
tyrielv wants to merge 1 commit into
microsoft:masterfrom
tyrielv:tyrielv/fix-placeholder-io-tolerance

Conversation

@tyrielv

@tyrielv tyrielv commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Problem

Two classes of transient SQLite errors hit PlaceholderTable in production:

  1. SQLITE_IOERR (10) - telemetry shows repeated disk I/O errors in GetFilePlaceholdersCount() during heartbeat, caused by ReFS snapshots, antivirus, or momentary disk busyness.

  2. SQLITE_LOCKED (6) - Bug 59353072: table lock contention on the Placeholder table during concurrent operations.

Both PlaceholderTable and SparseTable shared an identical pattern (connection pool, writer lock, try/catch wrapping) with no retry or transient error handling.

Fix

Extract shared logic into GVFSTable base class with four execution primitives:

Method Behavior
ExecuteWrite Under writer lock, retry up to 5x on transient errors
ExecuteRead Retry up to 5x on transient errors
ExecuteNonCriticalRead Return fallback value on transient error (no throw)
ExecuteReadThenWrite Mixed read+write on same connection, retry on transient errors

Transient errors retried (linear backoff: 50ms, 100ms, ... 250ms):

  • SQLITE_BUSY (5) - connection-level lock contention
  • SQLITE_LOCKED (6) - table-level lock contention
  • SQLITE_IOERR (10) - disk I/O errors

Non-critical count methods (GetCount, GetFilePlaceholdersCount, GetFolderPlaceholdersCount) return -1 on transient failure - only consumed by heartbeat telemetry.

Also fixes pre-existing copy-paste bug where GetFilePlaceholdersCount/GetFolderPlaceholdersCount exception messages incorrectly said GetCount.

Files Changed

  • New: GVFSTable.cs - base class with retry/lock/error infrastructure
  • Refactored: PlaceholderTable.cs - inherits GVFSTable, all operations use base methods
  • Refactored: SparseTable.cs - same treatment
  • Extended: SqliteErrorCodes.cs - added BUSY, LOCKED, IOERR constants + IsTransientError()
  • Updated: Unit tests - adjusted exception message assertions to match simplified format

Validation

  • All 872 unit tests pass (0 failures, 11 pre-existing skips)
  • Self-reviewed: caught and fixed a retry-duplicate bug in GetAllEntries

@tyrielv tyrielv force-pushed the tyrielv/fix-placeholder-io-tolerance branch 2 times, most recently from b75962e to 8e67a19 Compare June 19, 2026 00:11
…ia GVFSTable base class

Extract shared retry and error-handling logic into GVFSTable base class,
used by both PlaceholderTable and SparseTable. This provides:

- ExecuteWrite: serialized writes with retry on BUSY/LOCKED/IOERR
- ExecuteRead: reads with retry on transient errors
- ExecuteNonCriticalRead: returns fallback on transient error (heartbeat)
- ExecuteReadThenWrite: mixed operations with retry

Transient errors handled (up to 5 retries with linear backoff):
- SQLITE_BUSY (5): connection-level lock contention
- SQLITE_LOCKED (6): table-level lock contention (fixes #59353072)
- SQLITE_IOERR (10): disk I/O errors from AV/ReFS/disk busyness

Non-critical count methods (GetCount, GetFilePlaceholdersCount,
GetFolderPlaceholdersCount) return -1 on transient failure rather than
throwing, since they are only consumed by heartbeat telemetry.

Also fixes pre-existing copy-paste bug in exception messages where
GetFilePlaceholdersCount/GetFolderPlaceholdersCount reported as GetCount.

Assisted-by: Claude Opus 4.6
Signed-off-by: Tyrie Vella <tyrielv@gmail.com>
@tyrielv tyrielv force-pushed the tyrielv/fix-placeholder-io-tolerance branch from 8e67a19 to ede1b84 Compare June 19, 2026 00:14
@tyrielv tyrielv changed the title PlaceholderTable: tolerate transient SQLite disk I/O errors in count methods PlaceholderTable/SparseTable: add transient SQLite error resilience via GVFSTable base class Jun 19, 2026
@tyrielv tyrielv marked this pull request as ready for review June 19, 2026 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant