Export hotblocks storage size metrics NET-910#78
Conversation
Run a background task to roughly estimate how much disk space each dataset is using. Expose it with the metrics. These numbers may be used to quickly understand which datasets started growing, but not as precise disk usage profiler. Also move all the DB reads out of the scraping path to avoid locking the DB on calls to /metrics.
|
Have you run the benchmarks? I'm not familiar with this code, but by the quick look - PR introduces a basically fetch_all loop across all datasets, which is triggered every 60 seconds. Given that it reads from disk, de-serializes the data - it would be great to validate how it affects the overall performance. Additionally - have you considered keeping counters per dataset, with recording how much is added/removed at the places where it is done instead? If yes - why that idea was discarded? |
|
|
||
| #[instrument(name = "dataset_stats", skip_all)] | ||
| async fn dataset_stats_loop(db: DBRef, dataset_id: DatasetId, sender: tokio::sync::watch::Sender<DatasetStats>) { | ||
| const REFRESH: Duration = Duration::from_secs(60); |
There was a problem hiding this comment.
better move this to config parameters to better control the overhead, that this and similar loop causes
| } | ||
|
|
||
| /// Approximate on-disk size of each column family as `(cf_name, bytes)`. Counts | ||
| /// SST files only, excluding WAL and memtables. Constant-time property reads. |
There was a problem hiding this comment.
It's not constant-time property reads
https://github.com/facebook/rocksdb/blob/08809f5e6cd9cc4bc3958dd4d59457ae78c76660/include/rocksdb/db.h#L579
// "rocksdb.total-sst-files-size" - returns total size (bytes) of all SST
// files.
// WARNING: may slow down online queries if there are too many files.
There was a problem hiding this comment.
Thanks for noticing! Moving to the background task as well
| loop { | ||
| let db = db.clone(); | ||
| let span = tracing::Span::current(); | ||
| let result = tokio::task::spawn_blocking(move || { |
There was a problem hiding this comment.
One more blocking task per dataset, per minute doesn't look right to me.
There was a problem hiding this comment.
I've measured the time of running, and it's <40 ms per dataset at worst, without blocking anything.
The time of running the global metadata estimator is <10 ms and holds the global locks for writes, but doesn't affect reads.
There was a problem hiding this comment.
Sounds good, would you mind sharing some details related to the overall load, size of the db, etc?
|
There exist |
Even if we knew how much data is added/removed by every operation, we would need to read some initial value at the startup. But I don't think that storing N bytes of data in the KV storage corresponds to storing N bytes on disk. So I don't see a way to combine the values read from disk at startup and the incremental changes. My plan was to build an image and test it in a live environment. But let me do it in advance for safety. |
- Compute rocksdb_column_family_size_bytes on a background loop (storage_metrics_loop), published via a watch channel, instead of hitting RocksDB on every Prometheus scrape - Share one configurable --storage-stats-interval-secs between this loop and the existing per-dataset stats loop - Log measurement timing (elapsed_us) for both loops - Fix doc comments that understated column_family_sizes' cost: it holds RocksDB's DB mutex and scales with live SST files, not O(1) Part of NET-442
Run a background task to roughly estimate how much disk space each dataset is using. Expose it with the metrics. These numbers may be used to quickly understand which datasets started growing, but not as a precise disk usage profiler.
Also move all the DB reads out of the scraping path to avoid locking the DB on calls to /metrics.