docs: add benchmarking blog posts and performance reference page#254
Conversation
PaulRMellor
left a comment
There was a problem hiding this comment.
I read through the first post. Reads well overall, and the tone is nice and approachable. I left a few suggestions, particularly in places where the AI-assisted wording feels a bit noticeable.
6d70f9c to
e814763
Compare
showuon
left a comment
There was a problem hiding this comment.
Thanks for the blog post. It's good to see some real numbers about kroxy benchmark. Left some comments for the Does my proxy look big in this cluster? post.
|
|
||
| The transition wasn't a clean cliff edge — the proxy alternated between sustaining and saturating in a narrow band just above the ceiling. That pattern is characteristic of running right at a limit: it's not that it suddenly falls over, it's that small fluctuations (GC pauses, scheduling jitter) are enough to tip it either way. Stay below 14k and you're fine. Creep above it and you'll notice. The numbers are not absolute — they are just what we measured on our cluster; your mileage **will vary**. | ||
|
|
||
| ### The ceiling scales with CPU budget |
There was a problem hiding this comment.
I'd be interested in knowing if memory increasing helps here, in addition to CPU? I believe so since you mentioned GC overhead somewhere?
7c62876 to
6c40ee4
Compare
robobario
left a comment
There was a problem hiding this comment.
LGTM (some unapplied suggestions from Luke/Paul look worth adding)
Covers methodology, test environment, passthrough proxy results, encryption latency and throughput ceiling, the per-connection scaling insight, and sizing guidance. Includes a TODO placeholder for the connection sweep results before publication. Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Covers why we chose OMB over Kafka's own tools, the benchmark harness we built (Helm chart, orchestration scripts, JBang result processors), workload design rationale, CPU flamegraphs with embedded interactive iframes, the per-connection ceiling discovery, bugs found in our own tooling, and the cluster recovery incident. Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Adds /performance/ as a dedicated quick-reference page with headline benchmark numbers, comparison tables, and sizing guidance, linked from both blog posts. Updates the existing Performance section in overview.markdown with the key headline numbers and a link to the full reference page. Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
…aming - Shift publication dates to May 21 and May 28 - Replace speculative per-connection ceiling explanation with empirical finding: encryption throughput ceiling scales linearly with CPU budget (validated at 1000m, 2000m, 4000m) - Add sizing formula: CPU (mc) = 20 × produce_MB_per_s, with worked example - Add RF=3 masking caveat: initial 1-topic sweeps conflated Kafka replication ceiling with proxy CPU ceiling; coefficient derived from RF=1 multi-topic workloads - Post 2: add full investigation narrative — workload isolation approach, coefficient derivation, 4-core confirmation, and 2-core prediction/validation - Drop stale "future work" items that are now complete Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
The proxy is selectively L7: default infrastructure filters do genuine Kafka protocol work (address rewriting, API version negotiation, metadata caching) while high-volume produce/consume traffic bypasses full deserialisation via the decode predicate. The 1.4% proxy CPU share validates this design, not just reflects it. Also drop the Fyre cluster upgrade section — OCP-internal incident with no relevance to readers. Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
- Warm up test environment intro: realistic deployment framing - Add conversational lead-in to sizing guidance in both documents - Improve caveats opener in Post 1 - Add caveats section to performance page (RF=3 masking, message size, horizontal scaling) Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
- New opening: laptop/codebase/confidence → harness/cluster/nuance - Why not Kafka tools: add coordinated omission bullet with voice - What we built: reframe around two experimental questions (rate sweep, connection sweep) before tooling details; add two-dimensions framing - Banishing click-ops: replace dry Helm section with Red Hat/operator motivation and all-your-CRs joke - JSON always comes in megabytes: replace docs dump with signal/noise framing; sharpen Comparator vs Summariser distinction - Following the ceiling: rewrite as investigation arc (spare CPU → what were we hitting? → RF=3 masking → connection sweep → coefficient) - Rename Post 2 title to "How hard can it be??? Maxing out a Kroxylicious instance" - Revert slug rename (benchmarking-the-proxy-under-the-hood stays) - Update performance.markdown cross-links to match Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Replaces dry methodology notes with a fuller narrative arc: - Opens with the representative vs repeatable tension in benchmarking - Explains the single-partition choice and why it makes the author wince - Justifies RF=3: proxy adds one real hop, but RF=1 would double the hop count — not a fair production comparison - Multi-topic runs reconnect to representative: baseline tax at normal load - Rate sweep methodology explained as technique, not run-specific numbers Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
- Format all narrator asides as *(italic brackets)* to distinguish narrator voice from main text - Fix coordinated omission bullet missing bold formatting - Fix "tracking...tracking" redundancy in OMB paragraph - "it made me wince" → "*(I had to squirm to type it)*" — more honest, author reached for single-partition deliberately Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
- Reframe the takeaway: the proxy boils the latency-sensitive path to near-TCP-stack overhead while operating at Layer 7 — that's the win - Add paragraph explaining why overhead holds across 10/100 topics: the proxy doesn't contend between topics (unlike a broker which juggles disk I/O, partition leaders, and replication); the connection sweep validates linear throughput scaling Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Full investigation arc: spare CPU shock → NIC elimination → 4-producer test → anti-affinity attempt (3 nodes, 3 brokers, nowhere to go) → new cluster → baseline shock → RTT math reveals co-location → second penny drops on OMB scheduling → RF=1 unlocks proxy CPU ceiling → coefficient → prediction. Corrects several issues in the prior draft: Netty theory discarded (proxy metrics showed minimal back pressure); co-location framed at pod/node level not VM level; 37k flagged as the only figure from the original cluster; all coefficient and sweep numbers confirmed as coming from the new distributed cluster. Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Applies accurate numbers from the distributed 8-node cluster (5 workers, 3 masters) across all three files, replacing figures from the original co-located cluster: - Cluster description: 6-node → 8-node (5 workers, 3 masters) - RF=3 throughput ceiling: 37.2k→14,600 msg/s (encryption), 50-52k→19,400 msg/s (baseline), 26%→25% reduction - Coefficient: 12.5 mc/MB/s → 9.7 measured / 10 mc/MB/s operator formula - Formula: expose general form (10 × total proxy MB/s) with fan-out explanation; 20 × produce MB/s remains the 1:1 shorthand - 1-core RF=1: ~40k ceiling replaced with safe at 80k (91ms p99), saturating at ~126k - 4-core validation: 447ms→247ms at 160k; catastrophic→elevated at 321k (1,706ms); saturation above 321k - 2-core: comfortable at 80k (850ms), sustaining at 160k (720ms) — saturation not yet measured, consistent with model - Netty aside corrected: thread count scales with availableProcessors() (CPU limit), not fixed at 4 Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
- Rewrites flamegraph intro with personal motivation: hot path minimalism, Amdahl's law framing, and honest admission that the full sweep story didn't come together - Adds forward reference to bugs section to stitch the structure together - Moves OSS transparency point into "Run it yourself" where it naturally belongs, with a TODO placeholder for the raw data link - Drops duplicate "we share our workings" phrase from flamegraph prose Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
- Fix punctuation on OMB methodology comparability sentence - Fix repeated "We leaned towards repeatable" in workload design section - Fix tense: "will make" -> "makes" for workload design aside - Fix typo: "died in the wool" -> "dyed in the wool" - Add closing paragraph to flamegraph section: proxy wins are real but we aren't going to make AES faster - Replace stale 36k msg/s flamegraph references with FIXME pending new profiler runs Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
- Change "All good" to "Every good benchmarking story starts" (Bob's suggestion) - Add TL;DR paragraph with key numbers and sizing formula; flagged with FIXME comment pending final benchmark run Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
- Fix lone space-hyphen-space to em dash in OMB description - Add runtime warning (~14 hours) before benchmark commands with link to the full blog post reproduction script as a gist Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Explains why MWU testing was added (PhD teammate asked "is the difference real?"), how check-significance.sh works (per-window p99, ~30 samples, p < 0.05), and the honest caveat that per-window samples aren't fully uncorrelated. Distinguishes clearly between what MWU covers (latency delta realness) and what the coefficient derivation doesn't (n=4, no significance test, untested across message sizes). Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
- Move p99 explanation before first passthrough table where percentiles are first encountered; remove duplicate from encryption section - Expand Layer 7 point with one sentence of context for non-technical readers: most Kafka proxies operate at L4, Kroxylicious parses every message yet still adds only 0.2 ms - Add distribution board analogy for independent connection handling vs broker shared resource contention - Simplify replication factor caveat to one sentence, linking to companion post for detail - Fix "Most proxies" → "Most proxies operate on Kafka" for accuracy Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Post 2 is dated 2026-05-28 — Jekyll skips future posts by default, causing post_url resolution to fail at build time. Replace linked references with plain "companion post" text; links will be restored via a follow-up PR when Post 2 goes live. Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
- Convert TL;DR from prose to bulleted list (S1) - Soften "dominated by Kafka consumer fetch timeouts" to "likely dominated by" — this is an inference, not a measured fact (S5) - Inline definition of rate sweep at first use in sizing guidance (S9) - Broaden "With record encryption" to "With filters (record encryption is the representative example here)" (S10) Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Rename files and update front matter dates; update post_url reference in Post 2 to match Post 1's new filename. Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
- Fix "opevators" → "operators" - Add .DS_Store and .op/ to .gitignore - Remove accidentally committed macOS metadata and 1Password plugin files Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
These files are generated by running Jekyll locally and should not be committed. Add glob patterns to .gitignore to prevent recurrence. Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
- Add OpenShift 4.21, Strimzi 0.51.0 (Kafka 3.9), Vault 2.0.0 to test environment table - Replace multi-topic latency tables with final-run E2E data across all three scenarios (baseline, proxy-no-filters, encryption) - Add significance narrative for 10-topic results: proxy publish latency below noise, encryption E2E p99 paradoxically 9 ms lower than baseline - Add 100-topic tail finding: 99.9th percentile of per-window p99 is 750 ms for direct Kafka vs ~506 ms via proxy (-32%, p<0.001), interpreted as proxy serialisation smoothing bursty consumer delivery - Update CPU sizing coefficient from 10 mc/MB/s to 35 mc/MB/s (conservative, from single-partition measurement); update worked examples throughout - Remove FIXME comment; update TL;DR to reflect final numbers Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Reviewers flagged that cross-references to the companion post were confusing without context on when it would appear. Added "coming soon" consistently across all six mentions in Post 1. Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
P1: rephrase "polite engineering hat" as a translation frame P2: drop "got off the fence" — the action speaks for itself P3: "understand" → "parse" for technical precision Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
S1: add explanation of publish vs E2E latency near the first tables,
clarifying the intended-send-time baseline, acks=all replication,
and consumer-side fetch batching
S2: add memory caveat to Caveats section — workloads are CPU-bound
before memory-bound; notes consistent container settings and
conditions where assumption should be revisited
Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
- Update k from 35 to 25 mc/MB/s based on measured 10-topic data - Reformulate as CPU(mc) = k × (P + N × C) to make fan-out explicit - Note k=4-8 mc/MB/s for 100+ topic deployments (3× lower, more realistic) - Frame record encryption as among the most CPU-intensive filters possible - Replace single worked example with 1:1 and fan-out pair - Drop theoretical ×1.3 headroom — k is derived from real measurements Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
The connection sweep found no saturation at up to 320 MB/s across all core counts; the sweep was designed to measure the coefficient, not the ceiling. The sizing formula (k=25) gives operators what they need without presenting ceiling numbers we can't cleanly derive. Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
75d74ce to
095be15
Compare
The engineering companion post will be reviewed independently. Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Rename file and front matter date to match the scheduled publication slot (02:30 UTC / 14:30 NZST). Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
showuon
left a comment
There was a problem hiding this comment.
Had another look, LGTM. Just a minor comment.
|
|
||
| ### Throughput ceiling | ||
|
|
||
| A rate-sweep is exactly what it sounds like: pick a starting rate, let OMB run long enough to get a stable measurement, then step up by a fixed increment and repeat until the system can't keep up. We defined "can't keep up" as the sustained throughput dropping by more than 5% below the target rate — at that point, something has saturated. |
There was a problem hiding this comment.
Learned something new. Thanks.
The original "0.2–3 ms" didn't hold at 10-topic rates where publish avg is ~10 ms. Replace with accurate framing: E2E within noise, publish up to ~10 ms at comfortable rates. Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
- Fix cluster size: 8-node (5 workers) → 11-node (8 workers, 3 masters) - Add memory row: 16 GiB per node (verified via oc) - Update Kroxylicious version: 0.20.0 → 0.21.0 (verified from operator image) - Remove CPU limit from Kroxylicious table row — varied across tests - Add sentence explaining nodes are separate to ensure real network transit Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Environment table: - Correct cluster size: 8-node (5 workers) → 11-node (8 workers, 3 masters) - Add memory row: 16 GiB per node - Update Kroxylicious version: 0.20.0 → 0.21.0 - Remove CPU limit from Kroxylicious row — it varied across tests - Add note that nodes are separate to ensure real network transit CSS: - Add table styling to .card-text so Markdown tables render with visible borders and padding (Bootstrap doesn't auto-style them) Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
- Remove encryption column from passthrough tables — encryption gets its own section; passthrough section now shows baseline vs proxy only - Promote table headings to ### subheadings with framing sentences - Use global rate in headings (50,000 msg/s / 50 MB/s with 1 KB context) so it's clear both tables use identical total load - Fix incorrect claim that 100-topic means less work per record — the proxy does identical per-record work regardless of topic count - Scope "topics don't contend for shared resources" to the proxy — it is not true of Kafka brokers in general - Add table styling to .card-text for visible borders and padding Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Adds a third table to the passthrough section showing 1-topic, single-partition traffic at 10,100 msg/s — the most concentrated topology for Kafka — where the proxy is still indistinguishable from baseline. Updates closing paragraph to reference all three topic-count configurations. Verifies the encryption sub-saturation table numbers against aggregatedPublishLatency99pct from the rate-sweep results. Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Adds definition of mc (millicores; 1,000 mc = 1 core per second) to the sizing formula and a note for non-Kubernetes users to divide the result by 1,000 to get cores. Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Adds a sub-saturation caveat making clear all numbers assume operation below both the proxy's throughput ceiling and Kafka's replication limits. Folds the replication factor bullet into this broader caveat and drops the now-redundant separate entry. Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Summary
/performance/reference page summarising key numbers and linking to both postsoverview.markdownwith headline performance figures and a link to the reference pageStatus
Draft — the posts are first drafts. Known open items:
Test plan
./run.shand verify site renders athttp://127.0.0.1:4000//performance/page renders with correct tables/performance/work🤖 Generated with Claude Code