Carve deleted rows out of a SQLite database without trusting it, without writing to it, and without re-surfacing a single live row.
Measured against independent third-party ground truth — the SQLite Forensic Corpus (Nemetz, Schmitt & Freiling, DFRWS-EU 2018, CC0), whose authors shipped a per-row deleted answer key — and reported as a reproducible per-database confusion matrix (
docs/recovery-comparison.md). The headline: precision is the highest of any tool measured (it never re-surfaces a live row as "deleted", and emits only a small low-confidence phantom class), and freeblock-aware reconstruction leads in-page recall — 0.833 on the cleanest category (0C: 70 of 84 recoverable rows), ahead offqlite's 0.798, at roughly five times fewer false rows. Every number here is the harness-measured value against that independent corpus.
Every browser history, every chat app, every mobile artifact is a SQLite file — and the forensically interesting rows are usually the deleted ones. The standard sqlite3/rusqlite path cannot see them: it reads the live b-tree and stops. sqlite-forensic reads the raw file format itself — freelist pages, in-page free blocks, dropped-table pages, and an uncheckpointed WAL overlay — and recovers what the live query cannot, as severity-graded, confidence-scored observations.
This is a Rust library workspace with a CLI (sqlite4n6). The fastest path — point it at a database and read the deleted rows straight out of free space. It opens the evidence read-only and never writes the file or its sidecars:
$ sqlite4n6 carve History.db # deleted rows, table view
$ sqlite4n6 carve History.db --format jsonl # one JSON object per record
$ sqlite4n6 carve History.db --min-confidence medium # drop low-confidence carves
$ sqlite4n6 carve History.db --no-fragments # full rows only (fragments shown by default)
$ sqlite4n6 audit History.db # graded anomaly findingsWhen a -wal sidecar is present, carve auto-detects it and carves the full per-commit WAL timeline — every materializable state, each labelled with its log-sequence coordinate: the on-disk base image, each commit snapshot of the WAL, and the uncheckpointed WAL-frame residue. A row deleted late in a transaction history is still a live cell in an earlier commit's page image, so the snapshot column tells you the exact committed state a deleted row was last alive in. This is the real N-snapshot temporal model — not a two-point on-disk-vs-latest approximation.
$ sqlite4n6 carve chat.db # auto-detects chat.db-wal
page offset rowid recovery_source conf snapshot values
2 1581 130 commit-snapshot 0.90 commit:(3131615003,3836839008,0) 130 | bob | secret body 130
2 1261 ? commit-snapshot 0.40 commit:(3131615003,3836839008,1) NULL | NULL | ...
$ sqlite4n6 carve chat.db --wal /path/to/chat.db-wal # point at an explicit sidecar
$ sqlite4n6 carve chat.db --no-wal # on-disk image only, no snapshot columnThe snapshot column carries the salt-qualified LSN — commit:(salt1,salt2,commit_frame_index) for a committed snapshot, wal-frame:(salt1,salt2,frame_index) for raw frame residue, on-disk for the base image. A record identical across views is collapsed to its earliest committed coordinate. --no-wal carves the on-disk image alone (single view, no snapshot column). The evidence file and its sidecars are never written.
Two recovery surfaces sit on top of this. A deleted row whose payload outgrew the page (> usable − 35 bytes) spilled onto an overflow-page chain; carve reassembles such a row to a full record when every chain page survives as a freelist leaf (content-preserving) — a deliberately bounded capability, graded below the in-page tier, because a chain page reallocated as the freelist trunk destroys the record. And carve surfaces a Tier-2 fragment section by default (--no-fragments to suppress): when a row's full identity is destroyed but a distinctive cell survives contiguously (a TEXT ≥ 4 bytes or a REAL), that fragment is salvaged and kept strictly separate from the high-precision full-row tier — partial evidence a single surviving cell can still anchor, never mixed into the full-row set.
Or drive the library directly — point the analyzer at the file bytes and get graded findings plus carved deleted records:
use sqlite_core::Database;
use sqlite_forensic::{audit, carve_all_deleted_records};
let db = Database::open(std::fs::read("History")?)?; // read-only, owns the bytes
// 1. Graded header / freelist / WAL anomalies
for anomaly in audit(&db) {
println!("[{:?}] {} — {}", anomaly.severity, anomaly.code, anomaly.kind.note());
}
// 2. Deleted rows carved from free space — column count inferred per record
for rec in carve_all_deleted_records(&db) {
println!("recovered rowid {} from page {} (allocated: {})",
rec.rowid, rec.page, rec.allocated);
}The reader (sqlite-core) answers "what does this file actually contain?"; the analyzer (sqlite-forensic) grades the forensically notable parts and recovers the deleted ones.
| sqlite-forensic | rusqlite / sqlite3 |
|
|---|---|---|
| Read live rows | ✅ | ✅ |
| Read-only on the evidence file | ✅ | ✅ (with care) |
| Recover deleted rows from freelist pages | ✅ | — |
| Recover deleted rows from in-page free blocks | ✅ | — |
| Recover dropped-table rows (column count inferred) | ✅ | — |
| Reassemble deleted rows whose payload spilled to overflow-page chains | ✅ partial | — |
| Salvage partial rows as a separate Tier-2 fragment tier (a distinctive cell survives) | ✅ default | — |
| Read uncheckpointed WAL overlay as a separate view | ✅ | applied silently |
| Carve every WAL commit snapshot, LSN-labelled (per-commit timeline) | ✅ | — |
| Graded, confidence-scored anomaly findings | ✅ | — |
| Refuses to ever re-surface a live row as "deleted" | ✅ | n/a |
forbid(unsafe), panic-free on hostile input |
✅ | C / FFI |
This is one workspace (sqlite-forensic): two library crates following the fleet reader/analyzer split, plus the sqlite4n6 CLI that consumes them:
| Crate | Role | Entry points |
|---|---|---|
sqlite-core |
The raw, read-only, panic-free file-format reader: header parse, b-tree walk, freelist + overflow chains, and a read-only WAL overlay that maps onto the canonical forensicnomicon::history temporal cohort (each commit a salt-qualified [H] state). No findings. |
Database::open, Database::open_with_wal, freelist_pages, read_table, carve_free_regions, live_rowids, wal_timeline, WalTimeline::to_temporal_cohort |
sqlite-forensic |
The anomaly auditor + deleted-record carver: grades observations into forensicnomicon::report::Findings and recovers deleted rows. Depends on sqlite-core. |
audit, audit_findings, carve_all_deleted_records, carve_deleted_records |
sqlite-forensic accepts an in-memory Database (built from &[u8]) — it is medium-agnostic and has no dependency on any image format or container layer. Findings flow into the shared forensicnomicon::report model, so a SQLite database's anomalies aggregate uniformly with the partition / container / filesystem layers in a triage report.
audit() emits stable, scheme-prefixed codes (a published contract — never re-spelled). Each is an observation ("consistent with …"), graded for severity; the examiner draws the conclusion.
| Code | Severity | What it observes |
|---|---|---|
SQLITE-DELETED-RECORD-RECOVERED |
Medium | A record-shaped cell recovered from unallocated space — consistent with a deleted row not yet overwritten. Carries page / offset / rowid provenance. |
SQLITE-FREELIST-NONEMPTY |
Low | The database holds free pages — consistent with prior deletions (DELETE without VACUUM); those pages may retain recoverable rows. |
SQLITE-WAL-UNCHECKPOINTED |
Medium | A -wal sidecar carries committed page versions the main file does not reflect — the main file alone under-reports the true state. |
SQLITE-PAGECOUNT-MISMATCH |
High | The in-header page count disagrees with the count implied by file length — consistent with truncation, carving, or out-of-band modification. |
SQLITE-RESERVED-SPACE-NONZERO |
Low | The header reserves bytes per page — non-standard; consistent with a page-level extension such as encryption (SQLCipher/SEE) or a checksum VFS. |
The AnomalyKind enum is #[non_exhaustive]: new codes can be added without a breaking change, so downstream match arms must carry a _ arm.
A carver that over-reports is worse than useless on an evidence database — it manufactures rows that were never deleted. The design goal of this carver is therefore precision over recall, enforced structurally rather than by inspection:
- Read-only, panic-free,
forbid(unsafe)—Database::openowns aVec<u8>and never writes back to the artifact; the whole workspace deniesunsafeat compile time and reads every length/offset through bounds-checked helpers, so a malformed, attacker-controlled database cannot reach a raw-pointer path or panic. - Measured against independent third-party ground truth. Recall and precision are computed per database against the SQLite Forensic Corpus (Nemetz, Schmitt & Freiling, DFRWS-EU 2018, CC0), whose authors shipped a per-row deleted-record answer key — so the truth set is theirs, not ours. The harness (
forensic/tests/nemetz_metrics.rs) emits a reproducible confusion matrix; the full table is indocs/recovery-comparison.md. - High precision, structurally — never a live-row re-read. Our carver carves only the complement of the live cell extents on a page, then drops any carved record whose rowid is currently live. Across the Nemetz recall corpus it produces 0 live-re-reads (verified against the answer key's live rows), with only a small, low-confidence phantom class (all-empty/NULL records the inferred carver matches on a run of zero bytes). The two over-reporting failure modes the reference oracles exhibit on no-deletion databases — re-reading live cells, and re-surfacing a stale byte-copy of a live row — our carver does not.
- Strong in-page recall via freeblock reconstruction — reported honestly. On the cleanest category (
0C: records deleted in place,secure_delete=0, no overwrite, so every deleted row's bytes survive) the carver recovers 70 of the 84 cross-tool-scored rows (recall 0.833), ahead offqlite's 0.798. SQLite overwrites a freed cell's first four bytes (payload-length + rowid varints,header_len, leading serial) with the freeblock pointer;reconstruct_freeblock_recordsrebuilds each record from its surviving serial-type tail plus a schema template derived from a live cell on the same page, with the destroyed rowid surfaced as unknown. It does so at higher precision thanfqliteand 0 live-re-reads. - Overflow-page chains: partial recovery, honestly bounded. A deleted row whose payload spilled onto a freed overflow chain is reassembled to a full row only when every chain page survives as a freelist leaf; a chain page reallocated as the freelist trunk destroys the record, which is then refused from the full tier and surfaces only as a Tier-2 fragment. On the Nemetz
0Ecategory this reassembles the one byte-perfectly-recoverable spilled chain (verifiedassert_eq!against the answer key, substrate recall 1.000) for an end-to-end0Erecall of 0.333 — a deliberately bounded capability, graded below the in-page tier, never claimed as full overflow recovery. - Secondary checks stay labelled as such. The undark/fqlite differential (
docs/validation.md) is inter-tool concordance (the oracles disagree with each other — agreement, not correctness), and the DC3sqlite_dissectcorpus is a no-false-positive regression set (itsexpected_rowsare live content, not a deleted set), never a recall oracle.
Carved records remain confidence-graded observations ("consistent with a deleted row"), never a verdict. The honest summary: a strict precision discipline confirmed against independent ground truth, and a documented in-page recall gap — not a claim of perfect recall or proof of correctness.
Honest gaps (tracked, not hidden): there is no CI workflow and no line-coverage gate in this repo yet, and the carver is not yet fuzzed — all three are planned to bring it level with the Paranoid-Gatekeeper bar the rest of the fleet enforces. The safety lints (unsafe_code = forbid, unwrap_used/expect_used = deny) and the cargo-deny supply-chain gate are enforced today.
docs/validation.md— the Doer-Checker differential: how the carver was reconciled against undark and fqlite, page-level divergence diagnosis, build recipes.docs/recovery-comparison.md— the measured per-database recall/precision confusion matrix against independent Nemetz ground truth, with the undark/fqlite concordance and DC3 no-FP regression set as secondary checks.docs/corpus-catalog.md— every test fixture with its verbatim generator command and MD5.tests/data/README.md— the committed synthetic fixtures, co-located.
sqlite-forensic is the SQLite file-format parser in the RapidTriage DFIR toolkit:
| Crate | Artifact family |
|---|---|
| sqlite-forensic | SQLite databases (b-tree, freelist, WAL, deleted-record carving) |
| browser-forensic | Chrome / Firefox / Safari |
| winevt-forensic | Windows Event Logs (EVTX) |
| srum-forensic | Windows SRUM / ESE |
| memory-forensic | Process memory, page tables |
| forensicnomicon | Artifact catalog, format constants, report model |
Privacy Policy · Terms of Service · © 2026 Security Ronin Ltd