Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
211 changes: 211 additions & 0 deletions src/content/articles/tcpinfo-snapshot-analysis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,211 @@
---
title: "Analyzing TCP INFO Snapshots: Data Characteristics and Research Patterns"
description: A practical guide to M-Lab's TCP INFO snapshot data in BigQuery — how snapshots are collected and thinned, why most rows are noise, how to filter to real tests, and how to use RTT variance data to study latency-sensitive applications like VoIP.
tags: [research, data-access]
difficulty: intermediate
---

M-Lab's TCP INFO sidecar records a time series of kernel TCP socket statistics for every connection on the platform. The BigQuery table is large, heterogeneous, and counterintuitive until you understand how collection and storage work. This article explains the mechanics, the quirks, and the correct patterns for research use.

See [TCP INFO — M-Lab Core Service](../core-service-tcp-info) for a general reference on what fields the table contains.

## How Snapshot Collection Works

The `tcp-info` sidecar runs on every M-Lab server, polling the Linux kernel's `INET_DIAG` netlink interface to read the `tcp_info` struct for every active TCP connection on the host. This is a passive sidecar — it generates no traffic and does not interfere with measurements.

**Collection cadence.** The sidecar targets a 10 ms poll interval, but each poll must complete a full kernel TCP table dump before the next tick. The dump walks the kernel's established-connections hash table, which is sized at boot proportional to available RAM. On small 4-core / 4 GB nodes the dump takes ~0.8 ms, so the 10 ms target is easily met. On large 32-core / 67 GB nodes — including several sites such as `lga04`, `lga05`, `yyz04`, `mnl02`, and others — the dump takes ~13 ms, which causes the loop to slip to an effective ~25 ms interval.

**Thinning.** The ETL parser applies a 10× decimation at ingest: only 1 snapshot in 10 is written to BigQuery. This reduces storage costs and query scan size, but it means BigQuery snapshots are already spaced roughly **100–260 ms apart** depending on the site, not 10 ms.

| hardware class | host dump time | effective poll | BigQuery gap |
|---|---|---|---|
| 4-core / 4 GB | ~0.8 ms | ~10 ms | ~100 ms |
| 32-core / 67 GB (fast) | ~3.4 ms | ~10 ms | ~100 ms |
| 32-core / 67 GB (slow batch, e.g. LGA) | ~13 ms | ~25 ms | ~260 ms |
| 40–56 core | ~11 ms | ~25 ms | ~250+ ms |

For a 10-second NDT download test, a typical site stores about **94 snapshots** (one per ~110 ms). Sites in the slow-hardware batch store about **39 snapshots** per test (~259 ms apart). If you need the full 10 ms resolution it only exists in the raw `.zst` archives on GCS — not in BigQuery.

<div class="callout callout--note">
<span class="callout-icon">ℹ️</span>
<div class="callout-body">The per-site sparseness is not a malfunction. The fleet-wide ~10× gap is the ETL thinning; the additional ~2.6× at LGA-class sites is their slower INET_DIAG dump driven by hardware characteristics. Both are expected and stable.</div>
</div>

## Why Most Rows Are Noise

The sidecar monitors the host's **entire TCP table**, not just connections from active tests. About **50% of rows** in `measurement-lab.ndt.tcpinfo` are probes, port scanners, health checks, bots, and aborted handshakes hitting the public NDT endpoint. These appear as 1–2 snapshot rows with tiny byte counts (~250 bytes, ~1 segment).

The remaining ~50% of rows are the real NDT connections — but those connections each produce 39–94 snapshots, so they account for essentially all snapshot volume even though they are a minority of distinct connection rows.

**Never query `tcpinfo` in isolation.** Row count does not equal test count, and aggregate statistics over the raw table are dominated by noise. Always join against a test result table.

## The Correct Pattern: Join by UUID

Every completed NDT test has a UUID (`id`) that appears in both `ndt.ndt7` (or `ndt.ndt5`) and `ndt.tcpinfo`. Joining on `id` and `date` keeps only connections tied to a real test result and discards all scanner/handshake noise.

```sql
-- Join TCPinfo with NDT7 test results
SELECT
ndt7.id,
ndt7.date,
ndt7.a.MeanThroughputMbps,
ndt7.a.MinRTT,
ndt7.client.Geo.CountryCode AS client_country,
ndt7.server.Site AS server_site,
tcp.a.FinalSnapshot.TCPInfo.BytesAcked,
tcp.a.FinalSnapshot.TCPInfo.BytesRetrans,
tcp.a.FinalSnapshot.TCPInfo.TotalRetrans,
tcp.a.FinalSnapshot.TCPInfo.MinRTT AS tcpinfo_min_rtt_us,
tcp.a.FinalSnapshot.TCPInfo.RTT AS tcpinfo_rtt_us,
tcp.a.FinalSnapshot.TCPInfo.RTTVar AS tcpinfo_rttvar_us
FROM `measurement-lab.ndt.ndt7` AS ndt7
JOIN `measurement-lab.ndt.tcpinfo` AS tcp
ON ndt7.id = tcp.id
AND ndt7.date = tcp.date
WHERE
DATE(ndt7.a.TestTime) = "2026-06-01"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DATE(ndt7.a.TestTime) = "2026-06-01"
ndt7.date = "2026-06-01"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't actually filtering over the partition column (which is date), which means the query cannot run:

Cannot query over table 'mlab-oti.ndt.ndt7' without a filter over column(s) 'date' that can be used for partition elimination

AND ndt7.a.MeanThroughputMbps IS NOT NULL
ORDER BY ndt7.date DESC
LIMIT 10
```

<div class="callout callout--warn">
<span class="callout-icon">⚠️</span>
<div class="callout-body">Always filter by <code>DATE(ndt7.a.TestTime)</code> or <code>ndt7.date</code> to use partition pruning. Filtering by both <code>ndt7.date</code> and <code>tcp.date</code> in the JOIN is especially important — it prevents a full cross-partition scan on the tcpinfo table.</div>
</div>

## Snapshot Count Distribution

To understand the quality of TCPinfo data for a set of tests, it helps to look at the distribution of snapshot counts. Connections with only 1–2 snapshots are noise; completed NDT downloads typically have 10–100+ snapshots.

```sql
-- Snapshot count distribution for all tcpinfo rows (includes noise)
WITH snapshot_counts AS (
SELECT
id,
ARRAY_LENGTH(raw.Snapshots) AS num_snapshots
FROM `measurement-lab.ndt.tcpinfo`
WHERE
date = '2026-05-12'
AND client.Geo.CountryCode = 'US'
LIMIT 10000
)
SELECT
num_snapshots,
COUNT(*) AS num_connections,
ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) AS pct
FROM snapshot_counts
GROUP BY num_snapshots
ORDER BY num_snapshots
```

```sql
-- Snapshot count distribution for completed NDT7 tests only (noise removed)
WITH snapshot_counts AS (
SELECT
tcp.id,
ARRAY_LENGTH(tcp.raw.Snapshots) AS num_snapshots
FROM `measurement-lab.ndt.ndt7` AS ndt7
JOIN `measurement-lab.ndt.tcpinfo` AS tcp
ON ndt7.id = tcp.id
AND ndt7.date = tcp.date
WHERE
ndt7.date = '2026-06-01'
AND ndt7.a.MeanThroughputMbps IS NOT NULL
AND tcp.client.Geo.CountryCode = 'US'
LIMIT 10000
)
SELECT
num_snapshots,
COUNT(*) AS num_connections,
ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) AS pct
FROM snapshot_counts
GROUP BY num_snapshots
ORDER BY num_snapshots
```

Comparing the two outputs makes the noise problem concrete: the first query will show a large fraction of 1–2 snapshot rows; the second (UUID-joined) query will show a clean distribution concentrated at 40–100 snapshots.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're inviting a comparison here, I think it would be helpful if the two queries used the same date.

They also LIMIT 10000 in the inner query with no ORDER BY, which I believe makes the output non-deterministic. They then use this sample to compute a percentage, which would be non-deterministic as well.


## RTT, RTTVar, and Latency-Sensitive Applications

Each TCPinfo snapshot exposes three RTT fields from the kernel (values are in **microseconds**):

| Field | Description |
|---|---|
| `TCPInfo.MinRTT` | Minimum RTT observed since connection start — a proxy for propagation delay |
| `TCPInfo.RTT` | Kernel's smoothed RTT estimate (SRTT) at snapshot time |
| `TCPInfo.RTTVar` | Kernel's RTT variance estimate — a proxy for jitter |

`MinRTT` is the most reliable latency measure for research: it is less sensitive to transient congestion and queue buildup than SRTT. `RTTVar` captures variation in RTT across the connection and is useful as a jitter proxy for applications like VoIP and real-time video.

### Use Case: VoIP and Voice Quality Estimation

There is active research interest in using M-Lab TCPinfo data to assess whether voice-over-IP on the public internet meets quality thresholds comparable to traditional landline service. Key questions include:

- What fraction of connections have 99th-percentile RTT below 100 ms?
- What is the distribution of RTT variance (jitter) across connections?
- Does quality differ between rural and urban areas, or by ISP?

TCPinfo's `RTTVar` field (kernel RTTVAR, in microseconds) provides a per-connection jitter estimate. Because multiple snapshots are available per connection, you can also compute rolling statistics across the snapshot array.

<div class="callout callout--note">
<span class="callout-icon">ℹ️</span>
<div class="callout-body"><strong>Sampling density caveat.</strong> At most sites, BigQuery snapshots are ~110 ms apart; at LGA-class sites, ~260 ms apart. This is sufficient for characterizing latency distributions across many tests, but may be too coarse for sub-100 ms jitter analysis within a single connection. For sub-100 ms resolution, the full snapshot data is available in the raw <code>.zst</code> archives on GCS.</div>
</div>

```sql
-- RTT and jitter summary for completed NDT7 downloads, by country
SELECT
ndt7.client.Geo.CountryCode AS country,
ndt7.server.Site AS site,
COUNT(*) AS test_count,
ROUND(AVG(tcp.a.FinalSnapshot.TCPInfo.MinRTT) / 1000, 2) AS avg_min_rtt_ms,
ROUND(
APPROX_QUANTILES(tcp.a.FinalSnapshot.TCPInfo.RTT, 100)[OFFSET(95)] / 1000,
2
) AS p95_rtt_ms,
ROUND(AVG(tcp.a.FinalSnapshot.TCPInfo.RTTVar) / 1000, 2) AS avg_rttvar_ms
FROM `measurement-lab.ndt.ndt7` AS ndt7
JOIN `measurement-lab.ndt.tcpinfo` AS tcp
ON ndt7.id = tcp.id
AND ndt7.date = tcp.date
WHERE
DATE(ndt7.a.TestTime) = '2026-06-01'

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't actually filtering over the partition column (which is date), which means the query cannot run:

Cannot query over table 'mlab-oti.ndt.ndt7' without a filter over column(s) 'date' that can be used for partition elimination

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DATE(ndt7.a.TestTime) = '2026-06-01'
ndt7.date = '2026-06-01'

AND ndt7.a.MeanThroughputMbps IS NOT NULL
GROUP BY country, site
HAVING test_count > 100
ORDER BY avg_min_rtt_ms
LIMIT 50
```

## Site-Level Sparseness and What It Means for Analysis

When comparing results across M-Lab sites, be aware that snapshot density varies. Analyses that depend on within-connection time resolution (e.g., detecting short congestion events, estimating per-connection jitter from multiple snapshots) will have lower power at LGA-class sites than at small-hardware sites.

For most per-test aggregate analyses (mean RTT, final-snapshot statistics, throughput), the difference is immaterial — you still have 39–94 snapshots per test, which is plenty for stable estimates.

If you need to identify which sites are in the slow-hardware batch, you can compute the median within-connection snapshot gap from the `raw.Snapshots` array for a given date.

## Raw Data on GCS

For analyses requiring the full 10 ms resolution, the complete unthinnned snapshot archives are available in Google Cloud Storage:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For analyses requiring the full 10 ms resolution, the complete unthinnned snapshot archives are available in Google Cloud Storage:
For analyses requiring the full 10 ms resolution, the complete unthinned snapshot archives are available in Google Cloud Storage:


```
gs://archive-measurement-lab/ndt/tcpinfo/YYYY/MM/DD/
```

Files are stored in `.zst`-compressed JSONL format. Pavlos Sermpezis has a [Colab notebook](https://colab.research.google.com/) for snapshot-level analysis — ask on the M-Lab Discuss list or Slack for the current link.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Files are stored in .zst-compressed JSONL format

This is correct but omits the tarball layer: users will find .tgz archives containing per-connection .jsonl.zst files.


<!-- TODO: Add direct link to Pavlos' TCPinfo Colab notebook once it has a stable public URL. -->

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODOs in code comments aren't very visible — I'd rather wait until we have a public link to add here (if posting this isn't urgent), or create an issue/a CU task to document what is missing before merging this PR, perhaps assigning the person this is blocked on.

Also, AFAIK M-Lab's Slack isn't exactly "public" the same way the Discuss list is, it's on invitation.

<!-- TODO: Add section on unnesting the raw.Snapshots array in BigQuery for within-connection time series analysis. -->

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same: either add the section as part of this PR, or create an issue instead of a TODO in a comment.

(this applies to every other TODO in this file)

<!-- FIXME: Verify that the RTT/RTTVar fields cited above match the current ndt.tcpinfo schema exactly — column paths may differ between the ndt.tcpinfo view and raw tables. -->

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect the verification to happen before the KB article is posted. Could you please confirm that the TCPInfo schema matches?

<!-- TODO: Add worked example of computing per-connection jitter from the Snapshots array (UNNEST + window functions). -->
<!-- TODO: Link to Scott Jordan meeting notes and FCC VoIP proceeding context once those are public. -->

## Further Reading

- [TCP INFO — M-Lab Core Service](../core-service-tcp-info) — field reference and architecture overview
- [NDT (Network Diagnostic Tool)](../test-ndt) — the primary test whose connections TCPinfo instruments
- [Getting Started with M-Lab Data in BigQuery](../getting-started-bigquery) — access setup and query basics
- [Analyzing M-Lab Data: A Researcher's Guide](../research-guide) — broader M-Lab research patterns
- [Beyond Speed: Understanding Internet Quality Metrics](../internet-quality-beyond-speed) — context on latency and jitter as quality dimensions