From f277076eb517a46105204193d382c73ece251f2b Mon Sep 17 00:00:00 2001 From: Jonah Duckles Date: Wed, 1 Jul 2026 13:10:56 +1200 Subject: [PATCH 1/8] Draft of article based on discussions about TCP Info data and caveats about analyzing it --- .../articles/tcpinfo-snapshot-analysis.md | 211 ++++++++++++++++++ 1 file changed, 211 insertions(+) create mode 100644 src/content/articles/tcpinfo-snapshot-analysis.md diff --git a/src/content/articles/tcpinfo-snapshot-analysis.md b/src/content/articles/tcpinfo-snapshot-analysis.md new file mode 100644 index 0000000..c2aac75 --- /dev/null +++ b/src/content/articles/tcpinfo-snapshot-analysis.md @@ -0,0 +1,211 @@ +--- +title: "Analyzing TCP INFO Snapshots: Data Characteristics and Research Patterns" +description: A practical guide to M-Lab's TCP INFO snapshot data in BigQuery — how snapshots are collected and thinned, why most rows are noise, how to filter to real tests, and how to use RTT variance data to study latency-sensitive applications like VoIP. +tags: [research, data-access] +difficulty: intermediate +--- + +M-Lab's TCP INFO sidecar records a time series of kernel TCP socket statistics for every connection on the platform. The BigQuery table is large, heterogeneous, and counterintuitive until you understand how collection and storage work. This article explains the mechanics, the quirks, and the correct patterns for research use. + +See [TCP INFO — M-Lab Core Service](../core-service-tcp-info) for a general reference on what fields the table contains. + +## How Snapshot Collection Works + +The `tcp-info` sidecar runs on every M-Lab server, polling the Linux kernel's `INET_DIAG` netlink interface to read the `tcp_info` struct for every active TCP connection on the host. This is a passive sidecar — it generates no traffic and does not interfere with measurements. + +**Collection cadence.** The sidecar targets a 10 ms poll interval, but each poll must complete a full kernel TCP table dump before the next tick. The dump walks the kernel's established-connections hash table, which is sized at boot proportional to available RAM. On small 4-core / 4 GB nodes the dump takes ~0.8 ms, so the 10 ms target is easily met. On large 32-core / 67 GB nodes — including several sites such as `lga04`, `lga05`, `yyz04`, `mnl02`, and others — the dump takes ~13 ms, which causes the loop to slip to an effective ~25 ms interval. + +**Thinning.** The ETL parser applies a 10× decimation at ingest: only 1 snapshot in 10 is written to BigQuery. This reduces storage costs and query scan size, but it means BigQuery snapshots are already spaced roughly **100–260 ms apart** depending on the site, not 10 ms. + +| hardware class | host dump time | effective poll | BigQuery gap | +|---|---|---|---| +| 4-core / 4 GB | ~0.8 ms | ~10 ms | ~100 ms | +| 32-core / 67 GB (fast) | ~3.4 ms | ~10 ms | ~100 ms | +| 32-core / 67 GB (slow batch, e.g. LGA) | ~13 ms | ~25 ms | ~260 ms | +| 40–56 core | ~11 ms | ~25 ms | ~250+ ms | + +For a 10-second NDT download test, a typical site stores about **94 snapshots** (one per ~110 ms). Sites in the slow-hardware batch store about **39 snapshots** per test (~259 ms apart). If you need the full 10 ms resolution it only exists in the raw `.zst` archives on GCS — not in BigQuery. + +
+ ℹ️ +
The per-site sparseness is not a malfunction. The fleet-wide ~10× gap is the ETL thinning; the additional ~2.6× at LGA-class sites is their slower INET_DIAG dump driven by hardware characteristics. Both are expected and stable.
+
+ +## Why Most Rows Are Noise + +The sidecar monitors the host's **entire TCP table**, not just connections from active tests. About **50% of rows** in `measurement-lab.ndt.tcpinfo` are probes, port scanners, health checks, bots, and aborted handshakes hitting the public NDT endpoint. These appear as 1–2 snapshot rows with tiny byte counts (~250 bytes, ~1 segment). + +The remaining ~50% of rows are the real NDT connections — but those connections each produce 39–94 snapshots, so they account for essentially all snapshot volume even though they are a minority of distinct connection rows. + +**Never query `tcpinfo` in isolation.** Row count does not equal test count, and aggregate statistics over the raw table are dominated by noise. Always join against a test result table. + +## The Correct Pattern: Join by UUID + +Every completed NDT test has a UUID (`id`) that appears in both `ndt.ndt7` (or `ndt.ndt5`) and `ndt.tcpinfo`. Joining on `id` and `date` keeps only connections tied to a real test result and discards all scanner/handshake noise. + +```sql +-- Join TCPinfo with NDT7 test results +SELECT + ndt7.id, + ndt7.date, + ndt7.a.MeanThroughputMbps, + ndt7.a.MinRTT, + ndt7.client.Geo.CountryCode AS client_country, + ndt7.server.Site AS server_site, + tcp.a.FinalSnapshot.TCPInfo.BytesAcked, + tcp.a.FinalSnapshot.TCPInfo.BytesRetrans, + tcp.a.FinalSnapshot.TCPInfo.TotalRetrans, + tcp.a.FinalSnapshot.TCPInfo.MinRTT AS tcpinfo_min_rtt_us, + tcp.a.FinalSnapshot.TCPInfo.RTT AS tcpinfo_rtt_us, + tcp.a.FinalSnapshot.TCPInfo.RTTVar AS tcpinfo_rttvar_us +FROM `measurement-lab.ndt.ndt7` AS ndt7 +JOIN `measurement-lab.ndt.tcpinfo` AS tcp + ON ndt7.id = tcp.id + AND ndt7.date = tcp.date +WHERE + DATE(ndt7.a.TestTime) = "2026-06-01" + AND ndt7.a.MeanThroughputMbps IS NOT NULL +ORDER BY ndt7.date DESC +LIMIT 10 +``` + +
+ ⚠️ +
Always filter by DATE(ndt7.a.TestTime) or ndt7.date to use partition pruning. Filtering by both ndt7.date and tcp.date in the JOIN is especially important — it prevents a full cross-partition scan on the tcpinfo table.
+
+ +## Snapshot Count Distribution + +To understand the quality of TCPinfo data for a set of tests, it helps to look at the distribution of snapshot counts. Connections with only 1–2 snapshots are noise; completed NDT downloads typically have 10–100+ snapshots. + +```sql +-- Snapshot count distribution for all tcpinfo rows (includes noise) +WITH snapshot_counts AS ( + SELECT + id, + ARRAY_LENGTH(raw.Snapshots) AS num_snapshots + FROM `measurement-lab.ndt.tcpinfo` + WHERE + date = '2026-05-12' + AND client.Geo.CountryCode = 'US' + LIMIT 10000 +) +SELECT + num_snapshots, + COUNT(*) AS num_connections, + ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) AS pct +FROM snapshot_counts +GROUP BY num_snapshots +ORDER BY num_snapshots +``` + +```sql +-- Snapshot count distribution for completed NDT7 tests only (noise removed) +WITH snapshot_counts AS ( + SELECT + tcp.id, + ARRAY_LENGTH(tcp.raw.Snapshots) AS num_snapshots + FROM `measurement-lab.ndt.ndt7` AS ndt7 + JOIN `measurement-lab.ndt.tcpinfo` AS tcp + ON ndt7.id = tcp.id + AND ndt7.date = tcp.date + WHERE + ndt7.date = '2026-06-01' + AND ndt7.a.MeanThroughputMbps IS NOT NULL + AND tcp.client.Geo.CountryCode = 'US' + LIMIT 10000 +) +SELECT + num_snapshots, + COUNT(*) AS num_connections, + ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) AS pct +FROM snapshot_counts +GROUP BY num_snapshots +ORDER BY num_snapshots +``` + +Comparing the two outputs makes the noise problem concrete: the first query will show a large fraction of 1–2 snapshot rows; the second (UUID-joined) query will show a clean distribution concentrated at 40–100 snapshots. + +## RTT, RTTVar, and Latency-Sensitive Applications + +Each TCPinfo snapshot exposes three RTT fields from the kernel (values are in **microseconds**): + +| Field | Description | +|---|---| +| `TCPInfo.MinRTT` | Minimum RTT observed since connection start — a proxy for propagation delay | +| `TCPInfo.RTT` | Kernel's smoothed RTT estimate (SRTT) at snapshot time | +| `TCPInfo.RTTVar` | Kernel's RTT variance estimate — a proxy for jitter | + +`MinRTT` is the most reliable latency measure for research: it is less sensitive to transient congestion and queue buildup than SRTT. `RTTVar` captures variation in RTT across the connection and is useful as a jitter proxy for applications like VoIP and real-time video. + +### Use Case: VoIP and Voice Quality Estimation + +There is active research interest in using M-Lab TCPinfo data to assess whether voice-over-IP on the public internet meets quality thresholds comparable to traditional landline service. Key questions include: + +- What fraction of connections have 99th-percentile RTT below 100 ms? +- What is the distribution of RTT variance (jitter) across connections? +- Does quality differ between rural and urban areas, or by ISP? + +TCPinfo's `RTTVar` field (kernel RTTVAR, in microseconds) provides a per-connection jitter estimate. Because multiple snapshots are available per connection, you can also compute rolling statistics across the snapshot array. + +
+ ℹ️ +
Sampling density caveat. At most sites, BigQuery snapshots are ~110 ms apart; at LGA-class sites, ~260 ms apart. This is sufficient for characterizing latency distributions across many tests, but may be too coarse for sub-100 ms jitter analysis within a single connection. For sub-100 ms resolution, the full snapshot data is available in the raw .zst archives on GCS.
+
+ +```sql +-- RTT and jitter summary for completed NDT7 downloads, by country +SELECT + ndt7.client.Geo.CountryCode AS country, + ndt7.server.Site AS site, + COUNT(*) AS test_count, + ROUND(AVG(tcp.a.FinalSnapshot.TCPInfo.MinRTT) / 1000, 2) AS avg_min_rtt_ms, + ROUND( + APPROX_QUANTILES(tcp.a.FinalSnapshot.TCPInfo.RTT, 100)[OFFSET(95)] / 1000, + 2 + ) AS p95_rtt_ms, + ROUND(AVG(tcp.a.FinalSnapshot.TCPInfo.RTTVar) / 1000, 2) AS avg_rttvar_ms +FROM `measurement-lab.ndt.ndt7` AS ndt7 +JOIN `measurement-lab.ndt.tcpinfo` AS tcp + ON ndt7.id = tcp.id + AND ndt7.date = tcp.date +WHERE + DATE(ndt7.a.TestTime) = '2026-06-01' + AND ndt7.a.MeanThroughputMbps IS NOT NULL +GROUP BY country, site +HAVING test_count > 100 +ORDER BY avg_min_rtt_ms +LIMIT 50 +``` + +## Site-Level Sparseness and What It Means for Analysis + +When comparing results across M-Lab sites, be aware that snapshot density varies. Analyses that depend on within-connection time resolution (e.g., detecting short congestion events, estimating per-connection jitter from multiple snapshots) will have lower power at LGA-class sites than at small-hardware sites. + +For most per-test aggregate analyses (mean RTT, final-snapshot statistics, throughput), the difference is immaterial — you still have 39–94 snapshots per test, which is plenty for stable estimates. + +If you need to identify which sites are in the slow-hardware batch, you can compute the median within-connection snapshot gap from the `raw.Snapshots` array for a given date. + +## Raw Data on GCS + +For analyses requiring the full 10 ms resolution, the complete unthinnned snapshot archives are available in Google Cloud Storage: + +``` +gs://archive-measurement-lab/ndt/tcpinfo/YYYY/MM/DD/ +``` + +Files are stored in `.zst`-compressed JSONL format. Pavlos Sermpezis has a [Colab notebook](https://colab.research.google.com/) for snapshot-level analysis — ask on the M-Lab Discuss list or Slack for the current link. + + + + + + + +## Further Reading + +- [TCP INFO — M-Lab Core Service](../core-service-tcp-info) — field reference and architecture overview +- [NDT (Network Diagnostic Tool)](../test-ndt) — the primary test whose connections TCPinfo instruments +- [Getting Started with M-Lab Data in BigQuery](../getting-started-bigquery) — access setup and query basics +- [Analyzing M-Lab Data: A Researcher's Guide](../research-guide) — broader M-Lab research patterns +- [Beyond Speed: Understanding Internet Quality Metrics](../internet-quality-beyond-speed) — context on latency and jitter as quality dimensions From 0d1bdbb4b10480c13ec75f1c448d9174b2d9cfdf Mon Sep 17 00:00:00 2001 From: Jonah Duckles Date: Fri, 3 Jul 2026 10:01:59 +1200 Subject: [PATCH 2/8] Update src/content/articles/tcpinfo-snapshot-analysis.md Co-authored-by: Roberto D'Auria --- src/content/articles/tcpinfo-snapshot-analysis.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/content/articles/tcpinfo-snapshot-analysis.md b/src/content/articles/tcpinfo-snapshot-analysis.md index c2aac75..ddda65e 100644 --- a/src/content/articles/tcpinfo-snapshot-analysis.md +++ b/src/content/articles/tcpinfo-snapshot-analysis.md @@ -170,7 +170,7 @@ JOIN `measurement-lab.ndt.tcpinfo` AS tcp ON ndt7.id = tcp.id AND ndt7.date = tcp.date WHERE - DATE(ndt7.a.TestTime) = '2026-06-01' + ndt7.date = '2026-06-01' AND ndt7.a.MeanThroughputMbps IS NOT NULL GROUP BY country, site HAVING test_count > 100 From f4a288e8fac939b8313b99f3a8fb3a4252024026 Mon Sep 17 00:00:00 2001 From: Jonah Duckles Date: Fri, 3 Jul 2026 10:02:13 +1200 Subject: [PATCH 3/8] Update src/content/articles/tcpinfo-snapshot-analysis.md Co-authored-by: Roberto D'Auria --- src/content/articles/tcpinfo-snapshot-analysis.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/content/articles/tcpinfo-snapshot-analysis.md b/src/content/articles/tcpinfo-snapshot-analysis.md index ddda65e..3aefd42 100644 --- a/src/content/articles/tcpinfo-snapshot-analysis.md +++ b/src/content/articles/tcpinfo-snapshot-analysis.md @@ -188,7 +188,7 @@ If you need to identify which sites are in the slow-hardware batch, you can comp ## Raw Data on GCS -For analyses requiring the full 10 ms resolution, the complete unthinnned snapshot archives are available in Google Cloud Storage: +For analyses requiring the full 10 ms resolution, the complete unthinned snapshot archives are available in Google Cloud Storage: ``` gs://archive-measurement-lab/ndt/tcpinfo/YYYY/MM/DD/ From 5871db1c112a8a122c534dcc4f376df38295bba3 Mon Sep 17 00:00:00 2001 From: Jonah Duckles Date: Fri, 3 Jul 2026 10:02:24 +1200 Subject: [PATCH 4/8] Update src/content/articles/tcpinfo-snapshot-analysis.md Co-authored-by: Roberto D'Auria --- src/content/articles/tcpinfo-snapshot-analysis.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/content/articles/tcpinfo-snapshot-analysis.md b/src/content/articles/tcpinfo-snapshot-analysis.md index 3aefd42..1f02a25 100644 --- a/src/content/articles/tcpinfo-snapshot-analysis.md +++ b/src/content/articles/tcpinfo-snapshot-analysis.md @@ -63,7 +63,7 @@ JOIN `measurement-lab.ndt.tcpinfo` AS tcp ON ndt7.id = tcp.id AND ndt7.date = tcp.date WHERE - DATE(ndt7.a.TestTime) = "2026-06-01" + ndt7.date = "2026-06-01" AND ndt7.a.MeanThroughputMbps IS NOT NULL ORDER BY ndt7.date DESC LIMIT 10 From f58ff432c54b30d3eb1f611ae33eea3ceaef8873 Mon Sep 17 00:00:00 2001 From: Jonah Duckles Date: Fri, 3 Jul 2026 10:33:13 +1200 Subject: [PATCH 5/8] Decorate BigQuery examples with markers for dry-run CI validation --- src/content/articles/tcpinfo-snapshot-analysis.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/src/content/articles/tcpinfo-snapshot-analysis.md b/src/content/articles/tcpinfo-snapshot-analysis.md index 1f02a25..bc7f74e 100644 --- a/src/content/articles/tcpinfo-snapshot-analysis.md +++ b/src/content/articles/tcpinfo-snapshot-analysis.md @@ -43,6 +43,7 @@ The remaining ~50% of rows are the real NDT connections — but those connection Every completed NDT test has a UUID (`id`) that appears in both `ndt.ndt7` (or `ndt.ndt5`) and `ndt.tcpinfo`. Joining on `id` and `date` keeps only connections tied to a real test result and discards all scanner/handshake noise. + ```sql -- Join TCPinfo with NDT7 test results SELECT @@ -78,6 +79,7 @@ LIMIT 10 To understand the quality of TCPinfo data for a set of tests, it helps to look at the distribution of snapshot counts. Connections with only 1–2 snapshots are noise; completed NDT downloads typically have 10–100+ snapshots. + ```sql -- Snapshot count distribution for all tcpinfo rows (includes noise) WITH snapshot_counts AS ( @@ -99,6 +101,7 @@ GROUP BY num_snapshots ORDER BY num_snapshots ``` + ```sql -- Snapshot count distribution for completed NDT7 tests only (noise removed) WITH snapshot_counts AS ( @@ -153,6 +156,7 @@ TCPinfo's `RTTVar` field (kernel RTTVAR, in microseconds) provides a per-connect
Sampling density caveat. At most sites, BigQuery snapshots are ~110 ms apart; at LGA-class sites, ~260 ms apart. This is sufficient for characterizing latency distributions across many tests, but may be too coarse for sub-100 ms jitter analysis within a single connection. For sub-100 ms resolution, the full snapshot data is available in the raw .zst archives on GCS.
+ ```sql -- RTT and jitter summary for completed NDT7 downloads, by country SELECT From b953e11e3f208034bbd72c5f5bc74857a683c448 Mon Sep 17 00:00:00 2001 From: Jonah Duckles Date: Fri, 3 Jul 2026 12:41:18 +1200 Subject: [PATCH 6/8] Removing a few TODOs that aren't needed and notebook reference --- src/content/articles/tcpinfo-snapshot-analysis.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/src/content/articles/tcpinfo-snapshot-analysis.md b/src/content/articles/tcpinfo-snapshot-analysis.md index bc7f74e..bed50a1 100644 --- a/src/content/articles/tcpinfo-snapshot-analysis.md +++ b/src/content/articles/tcpinfo-snapshot-analysis.md @@ -198,13 +198,12 @@ For analyses requiring the full 10 ms resolution, the complete unthinned snapsho gs://archive-measurement-lab/ndt/tcpinfo/YYYY/MM/DD/ ``` -Files are stored in `.zst`-compressed JSONL format. Pavlos Sermpezis has a [Colab notebook](https://colab.research.google.com/) for snapshot-level analysis — ask on the M-Lab Discuss list or Slack for the current link. +Files are stored in `.zst`-compressed JSONL format. + - - ## Further Reading From 7c4a0cc4f70feb1ffec851344ef348645fbc2123 Mon Sep 17 00:00:00 2001 From: Jonah Duckles Date: Fri, 3 Jul 2026 12:44:43 +1200 Subject: [PATCH 7/8] Make snapshot-distribution comparison queries deterministic and comparable Use the same date (2026-06-01) in both queries and drop the un-ordered inner LIMIT 10000, which sampled rows non-deterministically and made the computed percentages unstable. LIMIT does not reduce BigQuery scan cost, so removing it costs nothing; the date + country filters bound the work. --- src/content/articles/tcpinfo-snapshot-analysis.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/src/content/articles/tcpinfo-snapshot-analysis.md b/src/content/articles/tcpinfo-snapshot-analysis.md index bed50a1..b5bde75 100644 --- a/src/content/articles/tcpinfo-snapshot-analysis.md +++ b/src/content/articles/tcpinfo-snapshot-analysis.md @@ -88,9 +88,8 @@ WITH snapshot_counts AS ( ARRAY_LENGTH(raw.Snapshots) AS num_snapshots FROM `measurement-lab.ndt.tcpinfo` WHERE - date = '2026-05-12' + date = '2026-06-01' AND client.Geo.CountryCode = 'US' - LIMIT 10000 ) SELECT num_snapshots, @@ -116,7 +115,6 @@ WITH snapshot_counts AS ( ndt7.date = '2026-06-01' AND ndt7.a.MeanThroughputMbps IS NOT NULL AND tcp.client.Geo.CountryCode = 'US' - LIMIT 10000 ) SELECT num_snapshots, From fd76b9d5547801679fa6a437a34493c084af3307 Mon Sep 17 00:00:00 2001 From: Jonah Duckles Date: Fri, 3 Jul 2026 12:50:07 +1200 Subject: [PATCH 8/8] Describe the tarball layer of raw TCPinfo archives on GCS Daily directories hold .tgz tarballs containing per-connection .jsonl.zst files, not bare .zst JSONL. --- src/content/articles/tcpinfo-snapshot-analysis.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/content/articles/tcpinfo-snapshot-analysis.md b/src/content/articles/tcpinfo-snapshot-analysis.md index b5bde75..5a37707 100644 --- a/src/content/articles/tcpinfo-snapshot-analysis.md +++ b/src/content/articles/tcpinfo-snapshot-analysis.md @@ -196,7 +196,7 @@ For analyses requiring the full 10 ms resolution, the complete unthinned snapsho gs://archive-measurement-lab/ndt/tcpinfo/YYYY/MM/DD/ ``` -Files are stored in `.zst`-compressed JSONL format. +Each day's directory contains `.tgz` tarballs (one per server, per time window). Inside each tarball are per-connection files in `.jsonl.zst` format — one Zstandard-compressed JSONL file per TCP connection, holding that connection's full snapshot time series. To work with the data, download the tarball, extract it, then decompress individual connection files with `zstd -d` (or read them directly with a library that supports streaming Zstandard).