From c6174bed2dd47af4379d32675d60176a1e68aab8 Mon Sep 17 00:00:00 2001 From: Artur Shiriev Date: Thu, 11 Jun 2026 17:05:40 +0300 Subject: [PATCH 1/7] docs: spec + plan for F-min (tutorials + observability split) F-min pulled from the docs-landing-and-comparison follow-ons. Two tutorials under a new docs/tutorials/ directory (Your first outbox app, Add a Kafka relay) plus a three-way split of docs/usage/observability.md into Reference (kept at URL, trimmed to event catalog + PromQL playbook), How-to (setup-prometheus-opentelemetry.md, adapter wiring), and Explanation (concepts/instrumentation-seams.md, the recorder-vs- middleware layering rationale). Spec covers structural calls (tutorials slot under Getting started, observability.md URL stays for SEO, tutorial code is executed end- to-end with literal captured output). Plan walks an executor through eight tasks; structural pieces are deterministic, tutorials require real code execution. Co-Authored-By: Claude Opus 4.7 (1M context) --- ...utorials-and-observability-split-design.md | 448 ++++++++++++++++++ ...-tutorials-and-observability-split-plan.md | 409 ++++++++++++++++ 2 files changed, 857 insertions(+) create mode 100644 planning/active/2026-06-11-docs-tutorials-and-observability-split-design.md create mode 100644 planning/active/2026-06-11-docs-tutorials-and-observability-split-plan.md diff --git a/planning/active/2026-06-11-docs-tutorials-and-observability-split-design.md b/planning/active/2026-06-11-docs-tutorials-and-observability-split-design.md new file mode 100644 index 0000000..b449a38 --- /dev/null +++ b/planning/active/2026-06-11-docs-tutorials-and-observability-split-design.md @@ -0,0 +1,448 @@ +--- +status: draft +date: 2026-06-11 +slug: docs-tutorials-and-observability-split +supersedes: null +superseded_by: null +pr: null +outcome: null +--- + +# Design: Add two tutorials and split observability.md + +## Summary + +Pull-the-piece-that-matters F-min from the +[`docs-landing-and-comparison`](../archived/2026-06-10-docs-landing-and-comparison-design.md) +follow-ons. Two changes that close the biggest Diátaxis gap on the +existing docs without restructuring everything: + +1. **Two new tutorials** — the only category of doc the site currently + has zero of: + - *Tutorial: Your first outbox app* — 10-minute walk-through from + `pip install` to a row landing through a handler. + - *Tutorial: Add a Kafka relay* — extends the first tutorial with a + foreign-broker relay; doubles as the worked end-to-end example + the **C** follow-on was supposed to land. +2. **Split `usage/observability.md`** (today's longest page, 327 + lines, mixing three Diátaxis quadrants) into three single-purpose + pages: a Reference (kept at the same URL), a How-to (Prometheus + + OTel setup), and an Explanation (the recorder-vs-middleware + layering rationale). + +Total: 4 new pages, 1 trimmed page. Nav grows from 18 pages to 22. +No existing page URL changes; the only deletions are intra-page +content moves. + +## Motivation + +- **Zero tutorials today.** Every page in `getting-started` is + reference-shaped (`installation.md`) or reference-with-narrative + (`basic.md`). A newcomer who lands on `docs/index.md` and clicks + "Install and write the first publisher / subscriber" arrives at + the Basic-usage page, which is a four-step section list more than + a story. No "let's build something together for ten minutes," no + one place where the path from zero to "a row landed through a + handler" is one continuous narrative. A tutorial in the Diátaxis + sense — warm voice, single concrete journey, end-state recap — is + the single highest-impact missing piece. + +- **The architecture has a relay payoff line that's not yet a worked + example.** The docs talk about the foreign-broker relay ( + [`usage/relay.md`](../../docs/usage/relay.md)) but never show the + full app: `docker compose up postgres kafka`, publish in a + transaction, watch the row land in Kafka, kill Kafka, watch the + retry. A tutorial that does this end-to-end is more persuasive + than the relay reference's three-line snippet. + +- **`observability.md` does too much.** At 327 lines it covers: + the recorder seam API (Reference), the Prometheus adapter + setup (How-to), the OTel adapter setup (How-to), the native + middleware (Reference + How-to), the layering of the two seams + (Explanation), the test-broker behavior (Reference), plus a + PromQL playbook (Reference). A reader who lands there with one + intent has to scan past two others to find their answer. + Splitting along Diátaxis lines into three pages — each with one + shape — makes each page self-contained. + +- **F-min, not F-full.** The full Diátaxis rewrite (`docs-landing- + and-comparison` non-goal F) is deferred for the reasons argued in + that spec: existing voice is consistent, current nav already + gestures at Diátaxis, and splitting decomposes into more pages + than the audience needs. F-min lands the two highest-impact + pieces (Tutorials gap + the worst mixed page) without committing + to the full restructure. + +## Non-goals + +Deliberately *not* covered here; each is a candidate follow-on: + +- **A third tutorial** ("Test handlers with `TestOutboxBroker`", + "Schedule a delayed delivery", "Wire DLQ"). Two is enough for + this pass; the testing one in particular would be high-value but + separate scope. + +- **Splitting any other mixed page** (`subscriber.md`, `dlq.md`, + `relay.md`, `fastapi.md`). All of them mix purposes; only + `observability.md` mixes three quadrants in 327 lines. The + others stay as-is. + +- **Renaming the existing nav sections** to Diátaxis-canonical names + (Tutorials / How-to / Reference / Explanation). The current + Overview / Getting started / Concepts / Guides / Reference / + Operations naming reads more naturally to most users and already + maps to the four quadrants conceptually. Don't churn the labels. + +- **Adding the architecture deep-dives from `architecture/` to the + public docs** as Explanation pages. That's a separate question + about audience (operators / contributors / consumers); leave the + current internal-only placement. + +- **Voice review of existing reference pages.** Today's terse, + precise voice on Reference pages is correct for Reference. The + new tutorials get a different (warm, step-by-step) voice; the + existing pages don't change. + +- **A whole "Tutorials" sub-grouping in the nav.** Two tutorials + slot under "Getting started" as flat entries with `Tutorial:` + prefixes. If we add a third, revisit the grouping then. + +## Design + +### 1. Tutorial: Your first outbox app + +New file: `docs/tutorials/first-outbox-app.md`. (New `tutorials/` +directory.) + +Goal — a reader with Python and Postgres familiarity follows the +page top-to-bottom in roughly ten minutes and ends with a running +process where a single `broker.publish` results in the handler +running, with no surprises along the way. + +Section outline: + +``` +# Tutorial: Your first outbox app + +What you'll build (2 sentences: a tiny app where publishing + inside a DB transaction triggers a handler) + +Before you start (Python 3.13+, Postgres, ~10 minutes) + +Step 1: Install (uv add 'faststream-outbox[asyncpg,validate]') + +Step 2: Start Postgres (one-liner Docker) + +Step 3: Declare the + outbox table (MetaData + make_outbox_table) + +Step 4: Create the schema (metadata.create_all in a one-shot script — + the easiest path for a tutorial; link out to + operations/alembic.md for the real Alembic + recipe) + +Step 5: Define a handler (@broker.subscriber("orders")) + +Step 6: Publish a row (inside session.begin()) + +Step 7: Run it (faststream run app:app, see the handler fire) + +What you just built (recap) + +What's next (links to: Subscriber reference, Publisher + reference, FastAPI integration guide, + Tutorial: Add a Kafka relay) +``` + +Voice: imperative + warm. Use "we" sparingly. Each step starts with +a one-sentence "what you're about to do" and ends with "you should +see X." No design rationale on the page; that's Concepts. Links to +Explanation for "why this works." + +Code: one file (`app.py`) grows step by step. Each step shows the +*diff* from the previous step (or full file if the diff would be +larger than the file). Final file is ~30 lines. + +### 2. Tutorial: Add a Kafka relay + +New file: `docs/tutorials/add-kafka-relay.md`. + +Goal — extends Tutorial #1 with a Kafka publisher decorator. Shows +the at-least-once contract end to end by deliberately killing Kafka +mid-flight and watching the retry land. + +Section outline: + +``` +# Tutorial: Add a Kafka relay + +What you'll add (turn the Tutorial-1 handler into a relay + that forwards to Kafka) + +Before you start (you finished Tutorial 1; we'll extend that + app) + +Step 1: Add Kafka (docker compose snippet) + +Step 2: Install + faststream[kafka] (uv add 'faststream[kafka]') + +Step 3: Add the Kafka + broker (KafkaBroker + publisher) + +Step 4: Stack the + decorator (@publisher_kafka @broker_outbox.subscriber) + +Step 5: Run it and + watch a row reach + Kafka (consume from the topic via the CLI) + +Step 6: Kill Kafka and + watch the retry (docker compose stop kafka, publish a row, + see the outbox subscriber retry; bring Kafka + back, see the row deliver) + +What you just built (recap — at-least-once relay) + +What's next (links to: Relay reference, Subscriber retry + strategies, Comparison page § + "vs FastStream foreign-broker direct") +``` + +Voice: same warmth as Tutorial 1; assumes familiarity from Tutorial +1. + +Code: extends the same `app.py` plus the `docker-compose.yml` from +Tutorial 1. + +### 3. Split `usage/observability.md` + +Today's `observability.md` (327 lines) becomes three pages, none of +which is named differently than today's nav entry. The existing URL +stays for SEO continuity. + +**3a. `usage/observability.md` (kept, trimmed to Reference)** + +Stays where it is. Becomes Reference-shaped: what events fire, what +tags each carries, what the middleware accepts, what the recorder +seam's signature is, the PromQL playbook queries. ~150 lines after +the trim. + +Concretely keeps from current page: +- Section "The recorder seam" — the `Callable[[str, Mapping], None]` + signature, the event catalog (`fetched` / `dispatched` / + `acked` / `nacked_*` / `lease_lost` / `published` / `dlq_written`), + the "must not block" note. +- The PromQL playbook table. +- Section "Test broker note". + +Removed from current page (moved): +- "Prometheus adapter" full setup example → How-to (3b) +- "OpenTelemetry adapter" full setup example → How-to (3b) +- "Native middleware (spans + bus parity)" full setup example → How-to (3b) +- "Layering: middleware seam vs. recorder seam" + the layered + example app + the section table → Explanation (3c) + +**3b. `usage/setup-prometheus-opentelemetry.md` (new How-to)** + +Goal — practical setup. Reader has decided they want metrics; +this page wires them in. + +Sections: +- Install (`pip install 'faststream-outbox[prometheus,opentelemetry]'`) +- The recorder-only setup (bare seam, no middleware) +- Prometheus adapter setup (full app with `AsgiFastStream` + + `/metrics`) +- OpenTelemetry adapter setup +- Both seams together (the "recommended setup" example with native + middleware + recorder) + +Voice: imperative, "to do X, do Y." Assumes competence — no Diátaxis +explanation of *why* there are two seams (that's 3c). + +Direct port of current `observability.md`'s adapter sections, with +~10 lines of glue to remove cross-section references that no longer +apply. + +**3c. `concepts/instrumentation-seams.md` (new Explanation)** + +Goal — answer "why are there two instrumentation seams?" for the +curious reader. + +Sections: +- The fundamental tension: events outside the bus +- What the middleware seam observes naturally +- What the middleware seam *can't* observe (the four cases: + `fetched` ticks, `lease_lost` after `consume_scope` exits, + `nacked_terminal(reason="max_deliveries")` before consume opens, + empty-fetch idle counters) +- What the recorder seam observes naturally +- The layering table (rendered from current `observability.md`'s + "Layering: middleware seam vs. recorder seam" table) +- Operator implication: pair both seams for full coverage + +Voice: discursive, explanatory. No code snippets except the table. + +### 4. Nav adjustments + +```yaml +nav: + - Overview: index.md + - Getting started: + - Installation: introduction/installation.md + - Basic usage: usage/basic.md + - 'Tutorial: Your first outbox app': tutorials/first-outbox-app.md + - 'Tutorial: Add a Kafka relay': tutorials/add-kafka-relay.md + - Concepts: + - How it works: introduction/how-it-works.md + - Comparison: concepts/comparison.md + - Instrumentation seams: concepts/instrumentation-seams.md # NEW + - Guides: + - FastAPI integration: usage/fastapi.md + - Relay to Kafka / RabbitMQ / NATS: usage/relay.md + - Timers: usage/timers.md + - Testing: usage/testing.md + - Schema validation: usage/schema-validation.md + - Setup Prometheus and OpenTelemetry: usage/setup-prometheus-opentelemetry.md # NEW + - Reference: + - Subscriber: usage/subscriber.md + - Publisher: usage/publisher.md + - Router: usage/router.md + - Dead-letter queue: usage/dlq.md + - Observability: usage/observability.md + - Operations: + - Production checklist: operations/checklist.md + - Troubleshooting: operations/troubleshooting.md + - Alembic migrations: operations/alembic.md +``` + +Four new entries (two tutorials + one how-to + one explanation), +zero file renames, zero URL changes for existing pages. Material's +sidebar handles 22 entries comfortably. + +### 5. Cross-link updates + +Mostly contained to the split: + +- `usage/observability.md` (the kept Reference) — top of page, add + a one-line pointer to the new how-to and the new explanation: + > Setting it up: [Setup Prometheus and OpenTelemetry + > ](./setup-prometheus-opentelemetry.md). Why two seams: [Concepts + > § Instrumentation seams](../concepts/instrumentation-seams.md). +- `concepts/instrumentation-seams.md` — links back to the reference + for the event catalog. +- `usage/setup-prometheus-opentelemetry.md` — links to the reference + for the event catalog and to the explanation for the "why." +- `docs/index.md` decision-tree table — no change needed (no + decision-tree row maps to the new pages; the tutorials are + reachable via "Install and write the first publisher / + subscriber", which they extend). + +The two tutorials cross-link to each other (Tutorial 1's "What's +next" points at Tutorial 2) and to relevant Reference / Concepts / +Operations pages from each "What's next" footer. + +### 6. The `tutorials/` directory + +New top-level `docs/tutorials/`. Lives alongside `docs/concepts/`, +`docs/operations/`, `docs/usage/`, `docs/introduction/`. Consistent +with the existing flat directory layout. + +## Operations + +None — in-repo. The mkdocs deploy workflow re-runs on push to +`main` whenever `docs/**` or `mkdocs.yml` changes; this PR triggers +both. The new URLs become available immediately at: + +- `https://faststream-outbox.modern-python.org/tutorials/first-outbox-app/` +- `https://faststream-outbox.modern-python.org/tutorials/add-kafka-relay/` +- `https://faststream-outbox.modern-python.org/usage/setup-prometheus-opentelemetry/` +- `https://faststream-outbox.modern-python.org/concepts/instrumentation-seams/` + +`https://faststream-outbox.modern-python.org/usage/observability/` +stays at its current URL (trimmed content); inbound deep links +keep resolving. + +## Out of scope (repeat list) + +Already named under Non-goals; repeated for grep: + +- Third tutorial (testing, scheduling, DLQ) +- Splitting `subscriber.md`, `dlq.md`, `relay.md`, `fastapi.md` +- Renaming nav sections to Diátaxis-canonical labels +- Adding `architecture/` deep-dives to public docs +- Voice review of existing Reference pages +- A dedicated Tutorials nav sub-grouping +- Migration-recipe regression tests (separate follow-on) +- `just plans` index generator (separate follow-on) + +## Testing + +Content-only; correctness is observable on the live site: + +- `just docs-build` (`mkdocs build --strict`) passes clean — every + internal cross-link from the four new pages and the trimmed + `observability.md` resolves. +- `just lint` passes (eof-fixer, ruff format, ruff check, ty check). +- Tutorial code must be **executed end-to-end** during plan + execution against a clean machine. Tutorial 1: from scratch, every + step's expected output observed. Tutorial 2: same, including the + "kill Kafka, see retry" step. The plan author runs these and the + *literal terminal output* lands inside the tutorial under "you + should see X" — no hand-edited expected output. Tutorials that + haven't been run produce frustration when readers try them and + miss a step. +- Reviewer manual sidebar scan: `just docs-serve` and confirm the + four new entries appear in the four new sidebar positions. +- Reviewer reads both tutorials end-to-end against a fresh checkout + and a clean Postgres / Kafka — the most valuable review move for + a tutorial. + +## Risk + +- **Tutorial voice drifts from the existing reference voice and + introduces inconsistency across the site.** Mitigated by the + explicit voice guidance in §1 and §2 (warm, step-by-step in + tutorials; everything else unchanged). The "What you just built" + recap pattern, the "Before you start" preamble, and the + "What's next" footer are intentional voice markers — they signal + "you are reading a tutorial" without requiring a Diátaxis- + literate reader. Tutorial voice is allowed to feel different from + Reference voice because they're serving different reader needs. + +- **Tutorial code goes stale faster than reference code.** Tutorial + code embeds version-specific install commands, Docker image tags, + and Postgres compose snippets — all of which drift faster than the + library's public API. Mitigated by keeping the tutorials minimal + (no premature abstractions, no library features outside the + tutorial's narrow path) so most updates are mechanical pin bumps. + Follow-up: tutorials could be the next thing tested by the + migration-recipe-style regression tests scaffolded out by the + operator-pages spec — a CI step that runs the tutorial code + against a real Postgres on every release. Out of scope here. + +- **Splitting `observability.md` breaks inbound deep links to its + current anchors.** Mitigated by keeping the page at its current + URL (only the content is trimmed) and by the URL-stable nature of + the move: section anchors that survive the trim (`#the-recorder-seam`, + `#test-broker-note`) keep resolving. Anchors that move to other + pages (`#layering-middleware-seam-vs-recorder-seam`, + `#prometheus-adapter`) break — they redirect via the new + cross-link callout at the top of the trimmed Reference page. + +- **The "What's next" footers create a maintenance graph.** Adding + a new tutorial later means revisiting the footer of every + existing tutorial to add the cross-link. At two tutorials the + cost is trivial. Re-evaluate if we ever add a fourth. + +- **Tutorial #2's "kill Kafka, see retry" step is the + most-likely-to-flake step in either tutorial.** Local environments + differ; Kafka's failure modes are platform-sensitive (especially + on Apple Silicon). Mitigated by recommending Confluent's + `cp-kafka` image specifically (known to work on M1+ from prior + use) and by treating the step as *demonstrative*: the tutorial + doesn't fail if Kafka comes back instantly with no observable + retry, because the at-least-once property is still preserved. + Reviewer flags if the step's reproduction is fragile and we drop + it from the tutorial in favor of a one-paragraph callout + explaining the contract. diff --git a/planning/active/2026-06-11-docs-tutorials-and-observability-split-plan.md b/planning/active/2026-06-11-docs-tutorials-and-observability-split-plan.md new file mode 100644 index 0000000..e63cac4 --- /dev/null +++ b/planning/active/2026-06-11-docs-tutorials-and-observability-split-plan.md @@ -0,0 +1,409 @@ +--- +status: draft +date: 2026-06-11 +slug: docs-tutorials-and-observability-split +spec: docs-tutorials-and-observability-split +pr: null +--- + +# docs-tutorials-and-observability-split — implementation plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use +> superpowers:subagent-driven-development (recommended) or +> superpowers:executing-plans to implement this plan task-by-task. Steps +> use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Add two tutorials under `docs/tutorials/` and split +`docs/usage/observability.md` into a trimmed Reference + a new +How-to (`usage/setup-prometheus-opentelemetry.md`) + a new Explanation +(`concepts/instrumentation-seams.md`), with one nav reshape commit +that surfaces the four new entries. + +**Spec:** [`planning/active/2026-06-11-docs-tutorials-and-observability-split-design.md`](./2026-06-11-docs-tutorials-and-observability-split-design.md) + +**Branch:** `docs/tutorials-and-observability-split` + +**Commit strategy:** Per-task commits. Tasks 2 (Tutorial 1) and 3 +(Tutorial 2) include the literal terminal output capture step — the +plan author runs each tutorial end-to-end against a clean local +environment before committing. + +--- + +### Task 1: Branch + commit spec + plan + README Active entry + +**Files:** +- Create: `planning/active/2026-06-11-docs-tutorials-and-observability-split-design.md` (already drafted) +- Create: `planning/active/2026-06-11-docs-tutorials-and-observability-split-plan.md` (this file) +- Modify: `planning/README.md` + +- [ ] **Step 1: Confirm branch + uncommitted artifacts** + + Run: `git branch --show-current && ls planning/active/` + Expected: branch `docs/tutorials-and-observability-split`; two + drafted files under `planning/active/`. + +- [ ] **Step 2: Update `planning/README.md` Active section** + + Replace the `_None._` line: + + ```markdown + ## Active + + - **[docs-tutorials-and-observability-split](active/2026-06-11-docs-tutorials-and-observability-split-design.md)** + — Two new tutorials under `docs/tutorials/` plus a three-way + split of `docs/usage/observability.md` into Reference + How-to + + Explanation. F-min from the docs-landing-and-comparison + follow-ons. + ``` + +- [ ] **Step 3: Commit** + + ```bash + git add planning/active/2026-06-11-docs-tutorials-and-observability-split-design.md \ + planning/active/2026-06-11-docs-tutorials-and-observability-split-plan.md \ + planning/README.md + git commit -m "docs: spec + plan for F-min (tutorials + observability split) + + Co-Authored-By: Claude Opus 4.7 (1M context) " + ``` + +--- + +### Task 2: Tutorial 1 — Your first outbox app + +**Files:** +- Create: `docs/tutorials/first-outbox-app.md` + +Write the tutorial per [spec §1 +](./2026-06-11-docs-tutorials-and-observability-split-design.md#1-tutorial-your-first-outbox-app) +**after running every step end-to-end against a clean local +environment.** Capture the literal terminal output at each step. + +**Setup for execution:** + +- [ ] **Step 1: Clean Postgres environment** + + Run: `docker compose down -v 2>&1; docker compose up -d postgres` + Expected: container fresh; no leftover `outbox` table from prior + sessions. + +- [ ] **Step 2: Fresh working directory under `/tmp`** + + Create `/tmp/outbox-tutorial-1/` and `cd` into it. This is the + tutorial reader's perspective: a directory with nothing in it. + +- [ ] **Step 3: Walk each tutorial step** + + Follow the section outline in spec §1 (Install → Start Postgres → + Declare → Schema → Handler → Publish → Run). For each step: + + - Execute the literal command the tutorial will tell the reader to + run. + - Capture the literal output. **Do not edit it.** + - If a step's command fails or produces output the spec didn't + anticipate, STOP and update the spec before re-running (the + tutorial must reflect reality). + +- [ ] **Step 4: Write `docs/tutorials/first-outbox-app.md`** + + Use the section outline in spec §1. Each step contains: + + - A one-sentence "what you're about to do" preamble. + - The literal command or code block. + - The literal captured output under "you should see:" (or + equivalent phrasing). Block-quote or `output` code block. + + Voice: warm, step-by-step, "we." Use `_What's next_` footer + linking to the Subscriber reference, Publisher reference, FastAPI + integration guide, and Tutorial 2. + +- [ ] **Step 5: Smoke-build** + + Run: `just docs-build` + Expected: clean. The page is orphaned-not-in-nav (Task 7 wires it + in); the warning is acceptable. + +- [ ] **Step 6: Commit** + + ```bash + git add docs/tutorials/first-outbox-app.md + git commit -m "docs: tutorial — your first outbox app + + Co-Authored-By: Claude Opus 4.7 (1M context) " + ``` + +--- + +### Task 3: Tutorial 2 — Add a Kafka relay + +**Files:** +- Create: `docs/tutorials/add-kafka-relay.md` + +Write the tutorial per [spec §2 +](./2026-06-11-docs-tutorials-and-observability-split-design.md#2-tutorial-add-a-kafka-relay) +extending Tutorial 1's app. Same end-to-end execution requirement. + +The "kill Kafka, see retry" step (spec §2 Step 6) is the +fragility risk flagged in the spec. Use Confluent's `cp-kafka` +image. If the step's repro is unstable on your environment, STOP +and reshape it as a callout that explains the at-least-once +contract without requiring the visible retry. + +- [ ] **Step 1: Extend the tutorial-1 environment** + + In `/tmp/outbox-tutorial-1/`, add Kafka to `docker-compose.yml`. + Confluent's `cp-kafka` image is the recommended choice for + cross-platform compatibility (especially Apple Silicon). + +- [ ] **Step 2: Walk each tutorial step** + + Same discipline as Task 2 — execute, capture, paste. The kill- + Kafka step: + + ``` + docker compose stop kafka + + + docker compose start kafka + + ``` + + If the retry log frequency / shape differs from what the spec + predicted, update the spec **before** writing the tutorial page. + +- [ ] **Step 3: Write `docs/tutorials/add-kafka-relay.md`** + + Use the section outline in spec §2. Cross-link to Tutorial 1's + "What's next" in a "Before you start" preamble. Footer links to + Relay reference, Subscriber § Retry strategies, and Comparison § + "vs FastStream foreign-broker direct". + +- [ ] **Step 4: Smoke-build** + + Run: `just docs-build` + Expected: clean. + +- [ ] **Step 5: Commit** + + ```bash + git add docs/tutorials/add-kafka-relay.md + git commit -m "docs: tutorial — add a Kafka relay + + Co-Authored-By: Claude Opus 4.7 (1M context) " + ``` + +--- + +### Task 4: New Explanation — `concepts/instrumentation-seams.md` + +**Files:** +- Create: `docs/concepts/instrumentation-seams.md` + +Write the Explanation per [spec §3c +](./2026-06-11-docs-tutorials-and-observability-split-design.md#3c-conceptsinstrumentation-seamsmd-new-explanation). +Extract the relevant content from the current `usage/observability.md` +§ "Layering: middleware seam vs. recorder seam" — same layering +table, same four "events the other seam physically cannot observe" +points, expanded narrative. + +Voice: discursive, explanatory. Aimed at "why are there two seams" +readers, not "how do I wire this" readers. + +- [ ] **Step 1: Read current `usage/observability.md` § Layering** + + Read the existing § "Layering: middleware seam vs. recorder seam" + for the table + the four bullet points to extract. + +- [ ] **Step 2: Write `docs/concepts/instrumentation-seams.md`** + + Per spec §3c outline (tension → middleware-only → recorder-only → + layering table → operator implication). + +- [ ] **Step 3: Smoke-build** + + Run: `just docs-build` + Expected: clean. + +- [ ] **Step 4: Commit** + + ```bash + git add docs/concepts/instrumentation-seams.md + git commit -m "docs: concept page — instrumentation seams (recorder vs middleware) + + Co-Authored-By: Claude Opus 4.7 (1M context) " + ``` + +--- + +### Task 5: New How-to — `usage/setup-prometheus-opentelemetry.md` + +**Files:** +- Create: `docs/usage/setup-prometheus-opentelemetry.md` + +Write the How-to per [spec §3b +](./2026-06-11-docs-tutorials-and-observability-split-design.md#3b-usagesetup-prometheus-opentelemetrymd-new-how-to). +Direct port of current `observability.md`'s adapter setup sections +(Prometheus adapter, OpenTelemetry adapter, Native middleware, +"both seams together") with cross-section references retargeted. + +- [ ] **Step 1: Lift the relevant sections** + + From current `usage/observability.md`: + - § "Prometheus adapter" (full code block + Consume vs publish + label set + PromQL queries that show wiring, not playbook) + - § "OpenTelemetry adapter" (full code block) + - § "Native middleware (spans + bus parity)" (full code block) + + Lift verbatim into the new page; clean up the cross-section + references that no longer resolve. + +- [ ] **Step 2: Add a short intro** + + One paragraph: "You've decided to wire metrics. This page is the + recipe. For the why, see [Concepts § Instrumentation seams]( + ../concepts/instrumentation-seams.md); for the event catalog and + PromQL playbook, see [Reference § Observability]( + ./observability.md)." + +- [ ] **Step 3: Smoke-build** + + Run: `just docs-build` + Expected: clean. + +- [ ] **Step 4: Commit** + + ```bash + git add docs/usage/setup-prometheus-opentelemetry.md + git commit -m "docs: how-to — setup Prometheus and OpenTelemetry + + Co-Authored-By: Claude Opus 4.7 (1M context) " + ``` + +--- + +### Task 6: Trim `usage/observability.md` to Reference shape + +**Files:** +- Modify: `docs/usage/observability.md` + +Strip everything that moved to Tasks 4 (Layering) and 5 (adapter +setups). Keep per [spec §3a +](./2026-06-11-docs-tutorials-and-observability-split-design.md#3a-usageobservabilitymd-kept-trimmed-to-reference): + +- § "The recorder seam" — the callable signature, the event catalog, + the "must not block" note. +- The PromQL playbook table (the *operator queries*, distinct from + the setup-wiring PromQL in Task 5). +- § "Test broker note". + +Add a top-of-page see-also pair pointing at the new How-to and +Explanation: + +```markdown +*Setting it up: [Setup Prometheus and OpenTelemetry]( +./setup-prometheus-opentelemetry.md). Why two seams: [Concepts § +Instrumentation seams](../concepts/instrumentation-seams.md).* +``` + +- [ ] **Step 1: Delete the moved sections** + + Per the spec §3a "Removed from current page (moved)" list: + - "Prometheus adapter" (entire section) → moved to Task 5 + - "OpenTelemetry adapter" (entire section) → moved to Task 5 + - "Native middleware (spans + bus parity)" → moved to Task 5 + - "Layering: middleware seam vs. recorder seam" + table → moved to Task 4 + +- [ ] **Step 2: Add the top-of-page see-also pair** + +- [ ] **Step 3: Smoke-build** + + Run: `just docs-build` + Expected: clean. Inbound deep links to the page's surviving anchors + (`#the-recorder-seam`, `#test-broker-note`) still resolve; + anchors that moved (`#prometheus-adapter`, etc.) are now dead + inside this page but reachable through the see-also at the top. + +- [ ] **Step 4: Commit** + + ```bash + git add docs/usage/observability.md + git commit -m "docs: trim observability.md to Reference shape + + Co-Authored-By: Claude Opus 4.7 (1M context) " + ``` + +--- + +### Task 7: Nav reshape + +**Files:** +- Modify: `mkdocs.yml` + +Add the four new entries to the nav per [spec §4 +](./2026-06-11-docs-tutorials-and-observability-split-design.md#4-nav-adjustments). +Two under Getting started, one under Concepts, one under Guides. + +- [ ] **Step 1: Edit `mkdocs.yml`** + + Replace the existing `nav:` block per the spec §4 sample. + +- [ ] **Step 2: Smoke-build** + + Run: `just docs-build` + Expected: clean. Sidebar shows the four new entries. + +- [ ] **Step 3: Commit** + + ```bash + git add mkdocs.yml + git commit -m "docs: nav reshape — surface tutorials, setup how-to, instrumentation explanation + + Co-Authored-By: Claude Opus 4.7 (1M context) " + ``` + +--- + +### Task 8: Verify + +**Files:** none modified; no commit produced. + +- [ ] **Step 1: Full strict build** + + Run: `just docs-build` + Expected: clean. All cross-links from the four new pages and the + trimmed `observability.md` resolve. + +- [ ] **Step 2: Lint pass** + + Run: `just lint` + Expected: eof-fixer, ruff format, ruff check, ty check all pass. + +- [ ] **Step 3: Manual sidebar scan** + + Run: `just docs-serve` + Open the served site. Confirm: + + - Getting started shows: Installation, Basic usage, Tutorial 1, + Tutorial 2. + - Concepts shows: How it works, Comparison, Instrumentation seams. + - Guides shows: FastAPI integration, Relay, Timers, Testing, + Schema validation, Setup Prometheus and OpenTelemetry. + - Reference's Observability page is now ~150 lines (trimmed + cleanly). + +- [ ] **Step 4: Re-run Tutorial 1 against a fresh checkout** + + In a temp directory, follow Tutorial 1 step-by-step using only + what the page tells you. Every command's output should match + what the page promised. If anything diverges, STOP and update + the tutorial. + +- [ ] **Step 5: Open the PR** + + Stop. Hand off to `superpowers:requesting-code-review` / + `superpowers:finishing-a-development-branch`. + + On merge, both halves of the pair move to `planning/archived/` + with `status: shipped`, `pr:`, and `outcome:` filled — same + archive pattern PRs #52 and #54 dogfooded. From 4e3392102c0ead2107cadd4986bad682840fba5d Mon Sep 17 00:00:00 2001 From: Artur Shiriev Date: Thu, 11 Jun 2026 17:06:09 +0300 Subject: [PATCH 2/7] docs: add docs-tutorials-and-observability-split to planning index Co-Authored-By: Claude Opus 4.7 (1M context) --- planning/README.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/planning/README.md b/planning/README.md index f54d6cb..a998cd2 100644 --- a/planning/README.md +++ b/planning/README.md @@ -11,7 +11,11 @@ points. ## Active -_None._ +- **[docs-tutorials-and-observability-split](active/2026-06-11-docs-tutorials-and-observability-split-design.md)** + — Two new tutorials under `docs/tutorials/` plus a three-way split + of `docs/usage/observability.md` into Reference + How-to + + Explanation. F-min from the docs-landing-and-comparison + follow-ons. ## Archived (shipped) From b5aa623894cff92e39a83e13c2cc069704e3595f Mon Sep 17 00:00:00 2001 From: Artur Shiriev Date: Thu, 11 Jun 2026 17:07:07 +0300 Subject: [PATCH 3/7] =?UTF-8?q?docs:=20concept=20page=20=E2=80=94=20instru?= =?UTF-8?q?mentation=20seams=20(recorder=20vs=20middleware)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/concepts/instrumentation-seams.md | 110 +++++++++++++++++++++++++ 1 file changed, 110 insertions(+) create mode 100644 docs/concepts/instrumentation-seams.md diff --git a/docs/concepts/instrumentation-seams.md b/docs/concepts/instrumentation-seams.md new file mode 100644 index 0000000..04f43f7 --- /dev/null +++ b/docs/concepts/instrumentation-seams.md @@ -0,0 +1,110 @@ +# Instrumentation seams + +`faststream-outbox` exposes **two complementary instrumentation seams** — +a *recorder* (callable) and a *native middleware* — and recommends +running both. This page explains why two; the practical setup recipes +live in [Setup Prometheus and OpenTelemetry](../usage/setup-prometheus-opentelemetry.md), +and the event catalog and PromQL playbook in +[Observability](../usage/observability.md). + +## The fundamental tension + +A FastStream broker emits two natural observation moments: + +- `consume_scope` — wraps a single handler invocation. The middleware + bus surfaces handler duration, message size, exception status, span + context. +- `publish_scope` — wraps a single producer call. Same idea on the + outbound side. + +Upstream FastStream middlewares (`TelemetryMiddleware`, +`PrometheusMiddleware`) hook into these two scopes. For Kafka, Rabbit, +NATS, that's the entire surface area — those buses don't have +outbox-internal events because they don't *have* an outbox. + +`faststream-outbox` does have outbox-internal events, and the middleware +bus physically cannot observe them. + +## What the middleware seam observes naturally + +Wrap `consume_scope` and `publish_scope` and you get: + +- Handler duration / status / message size. +- Span tracing across the handler invocation and the publish call. +- The exact label / instrument schema upstream Kafka and Rabbit users + already have dashboards for. + +This is the "spans + bus parity" mode the native middleware +(`OutboxTelemetryMiddleware`, `OutboxPrometheusMiddleware`) provides. + +## What the middleware seam *can't* observe + +Four events fire **outside** the handler invocation, with no +`StreamMessage` in scope: + +- **`fetched` ticks (including empty fetches).** Emitted by the fetch + loop every time it claims rows from the table, *before* any handler + runs. The middleware bus has no `consume_scope` to wrap yet — there + is no message. Empty-fetch ticks are also load-bearing for + detecting "polling but the queue is empty" patterns; the middleware + bus never sees them. +- **`lease_lost` events.** Fired after `consume_scope` has already + closed (the handler returned successfully but its terminal `DELETE` + matched zero rows because the lease expired). By the time we know + the row was lost, the middleware has long since recorded a normal + `acked`. The recorder catches the truth. +- **`nacked_terminal(reason="max_deliveries")`.** This row exceeded + the `max_deliveries` ceiling and was dropped *without invoking the + handler*. No handler call = no `consume_scope`. The middleware has + nothing to wrap. +- **The empty-fetch idle counter.** Same shape as `fetched` ticks — + fires when the fetch loop went a round without finding anything to + claim. Useful for tuning `min_fetch_interval` and `max_fetch_interval`. + The middleware bus has no concept of "the broker checked and found + nothing." + +## What the recorder seam observes naturally + +The recorder is a `Callable[[str, Mapping[str, Any]], None]` invoked at +six subscriber events and one producer event. Plus `dlq_written` when +the DLQ is configured. It fires whether or not a handler is in scope: + +- All four bus-invisible events above. +- Plus `acked` / `nacked_retried` / `nacked_terminal` / `dispatched` / + `published` from inside the handler-execution paths, with explicit + `subscriber` and `queue` tags. + +The recorder cannot bracket span lifecycles (it's a callable, not a +context manager), so tracing belongs to the middleware seam. + +## Layering: middleware seam vs. recorder seam + +Both can be registered together — each fires for events the other +physically cannot observe. + +| Concern | Middleware seam | Recorder seam | +|---|---|---| +| Handler duration / status / size | ✅ via `consume_scope` | ✅ via `acked` / `nacked_*` events | +| Publish duration / status / exception | ✅ via `publish_scope` | ✅ via `published` event | +| Span tracing (consume + publish) | ✅ | ❌ (callable can't bracket spans) | +| `fetched` ticks (including empty) | ❌ (no `StreamMessage` at fetch time) | ✅ | +| `lease_lost` after `consume_scope` exits | ❌ | ✅ | +| `nacked_terminal(reason="max_deliveries")` before consume opens | ❌ | ✅ | +| Empty-fetch idle counter | ❌ | ✅ | + +## Operator implication + +**Run both.** Middleware for bus-scope metrics, distributed tracing, +and label parity with the rest of your FastStream services. Recorder +for the outbox-internal events that don't have a `StreamMessage` to +attach to. + +The "Both seams together" recipe in [Setup Prometheus and OpenTelemetry +](../usage/setup-prometheus-opentelemetry.md#both-seams-together) +wires the recommended layout: native middleware on the broker, plus a +`metrics_recorder` for the outbox-internal events. + +This isn't redundancy — each seam fires for events the other can't see. +A service that registers only the middleware seam loses every +`lease_lost`, `fetched`, and `max_deliveries`-terminal signal. A +service that registers only the recorder seam loses tracing. From 91774955449e3d9bd447d9bac5a4faeaa586fe02 Mon Sep 17 00:00:00 2001 From: Artur Shiriev Date: Thu, 11 Jun 2026 17:09:24 +0300 Subject: [PATCH 4/7] =?UTF-8?q?docs:=20how-to=20=E2=80=94=20setup=20Promet?= =?UTF-8?q?heus=20and=20OpenTelemetry?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/usage/setup-prometheus-opentelemetry.md | 222 +++++++++++++++++++ 1 file changed, 222 insertions(+) create mode 100644 docs/usage/setup-prometheus-opentelemetry.md diff --git a/docs/usage/setup-prometheus-opentelemetry.md b/docs/usage/setup-prometheus-opentelemetry.md new file mode 100644 index 0000000..71dbe29 --- /dev/null +++ b/docs/usage/setup-prometheus-opentelemetry.md @@ -0,0 +1,222 @@ +# Setup Prometheus and OpenTelemetry + +You've decided to wire metrics. This page is the recipe. For the *why +two instrumentation seams*, see [Concepts § Instrumentation +seams](../concepts/instrumentation-seams.md); for the event catalog +and operator PromQL playbook, see [Reference § +Observability](./observability.md). + +## Prometheus adapter + +Drop-in compatible with FastStream's `PrometheusMiddleware`. Metric +names, label set, status enum, histogram buckets, and constructor args +all mirror upstream. + +```bash +pip install 'faststream-outbox[prometheus]' uvicorn +``` + +```python +# app.py — run with `uvicorn app:app --host 0.0.0.0 --port 8000` +from faststream.asgi import AsgiFastStream, make_ping_asgi +from prometheus_client import REGISTRY, make_asgi_app +from sqlalchemy import MetaData +from sqlalchemy.ext.asyncio import create_async_engine + +from faststream_outbox import OutboxBroker, make_outbox_table +from faststream_outbox.metrics.prometheus import PrometheusRecorder + + +metadata = MetaData() +outbox_table = make_outbox_table(metadata, table_name="outbox") +engine = create_async_engine("postgresql+asyncpg://outbox:outbox@localhost:5432/outbox") + +broker = OutboxBroker( + engine, + outbox_table=outbox_table, + metrics_recorder=PrometheusRecorder(app_name="checkout", registry=REGISTRY), +) + + +@broker.subscriber("orders", max_workers=4) +async def handle_order(body: dict) -> None: ... + + +app = AsgiFastStream( + broker, + asgi_routes=[ + ("/metrics", make_asgi_app(registry=REGISTRY)), + ("/healthz", make_ping_asgi(broker, timeout=2.0)), + ], +) +``` + +`AsgiFastStream` accepts any ASGI sub-app under `asgi_routes`; mount +`make_asgi_app(REGISTRY)` to expose Prometheus exposition without +pulling FastAPI in. `make_ping_asgi(broker)` is FastStream's built-in +liveness probe — handy for Kubernetes. + +The `broker` label is always `"outbox"`; existing FastStream Grafana +dashboards keep working — add `broker="outbox"` to the PromQL filter. + +### Consume vs publish label set + +The adapter uses a different label set for consume vs publish, +matching upstream verbatim: + +- Consume tags by `handler` (the subscriber) +- Publish tags by `destination` (the queue) + +See [Observability § PromQL playbook](./observability.md) for the +operator query catalog. + +## OpenTelemetry adapter + +Drop-in compatible with FastStream's `TelemetryMiddleware`, **meter +only — no spans** (use the [native middleware](#native-middleware-spans--bus-parity) +section below if you need spans). + +```bash +pip install 'faststream-outbox[opentelemetry,prometheus]' \ + opentelemetry-exporter-prometheus uvicorn +``` + +```python +# app.py — run with `uvicorn app:app --host 0.0.0.0 --port 8000` +from faststream.asgi import AsgiFastStream +from opentelemetry import metrics +from opentelemetry.exporter.prometheus import PrometheusMetricReader +from opentelemetry.sdk.metrics import MeterProvider +from prometheus_client import REGISTRY, make_asgi_app +from sqlalchemy import MetaData +from sqlalchemy.ext.asyncio import create_async_engine + +from faststream_outbox import OutboxBroker, make_outbox_table +from faststream_outbox.metrics.opentelemetry import OpenTelemetryRecorder + + +# OTel meters → Prometheus reader (scraped at /metrics below) +prometheus_reader = PrometheusMetricReader() +meter_provider = MeterProvider(metric_readers=[prometheus_reader]) +metrics.set_meter_provider(meter_provider) + +metadata = MetaData() +outbox_table = make_outbox_table(metadata, table_name="outbox") +engine = create_async_engine("postgresql+asyncpg://outbox:outbox@localhost:5432/outbox") + +broker = OutboxBroker( + engine, + outbox_table=outbox_table, + metrics_recorder=OpenTelemetryRecorder(meter_provider=meter_provider), +) + + +@broker.subscriber("orders", max_workers=4) +async def handle_order(body: dict) -> None: ... + + +app = AsgiFastStream(broker, asgi_routes=[("/metrics", make_asgi_app(registry=REGISTRY))]) +``` + +The `PrometheusMetricReader` converts OTel meter data points to +Prometheus exposition format on `/metrics`; for OTLP push instead, +swap the reader for `PeriodicExportingMetricReader(OTLPMetricExporter(...))` +and drop the `/metrics` route. + +Instrument names (`messaging.process.duration`, +`messaging.publish.duration`, `messaging.process.messages` when +`include_messages_counters=True`), units, and constructor args +(`meter_provider`, `meter`, `include_messages_counters`) match +`faststream.opentelemetry.TelemetryMiddleware`. The +`messaging.system="outbox"` attribute disambiguates outbox traffic +from Kafka / Rabbit data on the same instruments. + +**Tracing (spans) is not modelled by this adapter** — the callable +seam can't bracket a span lifecycle. For spans, use the [native +middleware](#native-middleware-spans--bus-parity) integration below. + +## Native middleware (spans + bus parity) { #native-middleware-spans--bus-parity } + +For OTel spans wrapping `consume_scope` / `publish_scope` and the +exact upstream label / instrument schema, register the native +middleware subclasses via `broker_middlewares=[...]` — same +registration pattern as `KafkaPrometheusMiddleware` / +`RabbitTelemetryMiddleware`. + +## Both seams together { #both-seams-together } + +The recommended setup pairs middleware with the recorder so every +event the bus emits **and** every outbox-internal event lands in one +observability stack: + +```bash +pip install 'faststream-outbox[opentelemetry,prometheus]' \ + opentelemetry-exporter-otlp opentelemetry-exporter-prometheus uvicorn +``` + +```python +# app.py — run with `OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 \ +# uvicorn app:app --host 0.0.0.0 --port 8000` +from faststream.asgi import AsgiFastStream +from opentelemetry import metrics, trace +from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter +from opentelemetry.exporter.prometheus import PrometheusMetricReader +from opentelemetry.sdk.metrics import MeterProvider +from opentelemetry.sdk.resources import Resource +from opentelemetry.sdk.trace import TracerProvider +from opentelemetry.sdk.trace.export import BatchSpanProcessor +from prometheus_client import REGISTRY, make_asgi_app +from sqlalchemy import MetaData +from sqlalchemy.ext.asyncio import create_async_engine + +from faststream_outbox import OutboxBroker, make_outbox_table +from faststream_outbox.metrics.prometheus import PrometheusRecorder +from faststream_outbox.opentelemetry import OutboxTelemetryMiddleware +from faststream_outbox.prometheus import OutboxPrometheusMiddleware + + +# ----- OTel SDK ----- +resource = Resource.create({"service.name": "my-outbox-service"}) +tracer_provider = TracerProvider(resource=resource) +tracer_provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter())) +trace.set_tracer_provider(tracer_provider) + +meter_provider = MeterProvider(resource=resource, metric_readers=[PrometheusMetricReader()]) +metrics.set_meter_provider(meter_provider) + +# ----- Outbox broker ----- +metadata = MetaData() +outbox_table = make_outbox_table(metadata, table_name="outbox") +engine = create_async_engine("postgresql+asyncpg://outbox:outbox@localhost:5432/outbox") + +broker = OutboxBroker( + engine, + outbox_table=outbox_table, + middlewares=[ + # Bus-scope spans + meters around consume_scope / publish_scope. + OutboxTelemetryMiddleware(tracer_provider=tracer_provider, meter_provider=meter_provider), + OutboxPrometheusMiddleware(registry=REGISTRY, app_name="my-outbox-service"), + ], + # Outbox-internal events (fetched, lease_lost, terminal reasons) that have + # no message context and can't reach the middleware bus. + metrics_recorder=PrometheusRecorder(registry=REGISTRY, app_name="my-outbox-service"), +) + + +@broker.subscriber("orders", max_workers=4) +async def handle_order(body: dict) -> None: ... + + +app = AsgiFastStream(broker, asgi_routes=[("/metrics", make_asgi_app(registry=REGISTRY))]) +``` + +Traces flow to OTLP (Jaeger / Tempo / Honeycomb / collector); meters +and the recorder's outbox-internal counters land on `/metrics` for +Prometheus to scrape. One process, one ASGI app, one scrape endpoint. + +The providers set `messaging.system = "outbox"`, matching the +recorder-seam adapters. The OTel provider maps `row.id → +messaging.message.id`, `row.queue → messaging.destination_publish.name`, +`correlation_id → messaging.message.conversation_id`, `len(payload) → +messaging.message.payload_size_bytes`, and `len(cmd.batch_bodies) → +messaging.batch.message_count` when >1. From 756b9966993ef48978fc1473643ac7a0e5bbfd20 Mon Sep 17 00:00:00 2001 From: Artur Shiriev Date: Thu, 11 Jun 2026 17:10:26 +0300 Subject: [PATCH 5/7] docs: trim observability.md to Reference shape Splits out the adapter setup recipes (now usage/setup-prometheus-opentelemetry.md) and the layering rationale (now concepts/instrumentation-seams.md). Kept page stays at the URL for SEO continuity; new content adds an event-catalog table and absorbs the operator PromQL playbook from the former Prometheus-adapter section. Top-of-page see-also pair links to the new how-to and the new explanation. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/usage/observability.md | 265 +++++------------------------------- 1 file changed, 31 insertions(+), 234 deletions(-) diff --git a/docs/usage/observability.md b/docs/usage/observability.md index 1194b24..6663eae 100644 --- a/docs/usage/observability.md +++ b/docs/usage/observability.md @@ -1,19 +1,10 @@ # Observability -The broker exposes **two complementary instrumentation seams**: +*Setting it up: [Setup Prometheus and OpenTelemetry](./setup-prometheus-opentelemetry.md). +Why two seams: [Concepts § Instrumentation seams](../concepts/instrumentation-seams.md).* -1. **Recorder seam** — a single callable invoked at six subscriber events - (`fetched`, `dispatched`, `acked`, `nacked_retried`, `nacked_terminal`, - `lease_lost`) and one producer event (`published`). Owns outbox-internal - events that the FastStream middleware bus physically cannot observe. -2. **Native middleware** — subclasses of upstream FastStream's - `TelemetryMiddleware` and `PrometheusMiddleware` plug into - `consume_scope` / `publish_scope` for spans, durations, status, and - message size — matching upstream Kafka / Rabbit middlewares exactly. - -You can use either, both, or neither. The recommended setup for full -observability is **both seams together**: middleware owns bus-scope -metrics + tracing, recorder owns outbox-internal events. +This page is the **Reference**: the recorder-seam API, the event +catalog, and the operator PromQL playbook. ## The recorder seam @@ -56,66 +47,28 @@ graph. Every call site wraps the recorder in `try/except` and logs at DEBUG, so a broken recorder never poisons the dispatch loop. -## Prometheus adapter - -Drop-in compatible with FastStream's `PrometheusMiddleware`. Metric names, -label set, status enum, histogram buckets, and constructor args all mirror -upstream. - -```bash -pip install 'faststream-outbox[prometheus]' uvicorn -``` - -```python -# app.py — run with `uvicorn app:app --host 0.0.0.0 --port 8000` -from faststream.asgi import AsgiFastStream, make_ping_asgi -from prometheus_client import REGISTRY, make_asgi_app -from sqlalchemy import MetaData -from sqlalchemy.ext.asyncio import create_async_engine - -from faststream_outbox import OutboxBroker, make_outbox_table -from faststream_outbox.metrics.prometheus import PrometheusRecorder - - -metadata = MetaData() -outbox_table = make_outbox_table(metadata, table_name="outbox") -engine = create_async_engine("postgresql+asyncpg://outbox:outbox@localhost:5432/outbox") +## Event catalog -broker = OutboxBroker( - engine, - outbox_table=outbox_table, - metrics_recorder=PrometheusRecorder(app_name="checkout", registry=REGISTRY), -) +| Event | Tags (always present) | Tags (situational) | Fired by | +|---|---|---|---| +| `fetched` | `queue`, `subscriber`, `count` | | Fetch loop, every cycle (including empty) | +| `dispatched` | `queue`, `subscriber` | | Worker loop, before handler runs | +| `acked` | `queue`, `subscriber` | `duration_seconds` | Handler returned successfully | +| `nacked_retried` | `queue`, `subscriber`, `attempts_count`, `deliveries_count` | `exception_type` | Retry scheduled | +| `nacked_terminal` | `queue`, `subscriber`, `deliveries_count`, `reason` | `exception_type` | Row terminally failed | +| `lease_lost` | `queue`, `phase`, `row_id`, `deliveries_count` | | Terminal write found `rowcount == 0` | +| `published` | `queue`, `destination` | `duration_seconds`, `payload_size_bytes` | Producer INSERT committed | +| `dlq_written` | `queue`, `subscriber`, `deliveries_count`, `failure_reason` | `exception_type` | DLQ CTE wrote an audit row | +`reason` on `nacked_terminal` is one of `max_deliveries`, +`retry_terminal`, `rejected`. The same value lands in the DLQ +`failure_reason` column when the DLQ is configured. -@broker.subscriber("orders", max_workers=4) -async def handle_order(body: dict) -> None: ... +## PromQL playbook - -app = AsgiFastStream( - broker, - asgi_routes=[ - ("/metrics", make_asgi_app(registry=REGISTRY)), - ("/healthz", make_ping_asgi(broker, timeout=2.0)), - ], -) -``` - -`AsgiFastStream` accepts any ASGI sub-app under `asgi_routes`; mount -`make_asgi_app(REGISTRY)` to expose Prometheus exposition without pulling -FastAPI in. `make_ping_asgi(broker)` is FastStream's built-in liveness -probe — handy for Kubernetes. - -The `broker` label is always `"outbox"`; existing FastStream Grafana -dashboards keep working — add `broker="outbox"` to the PromQL filter. - -### Consume vs publish label set - -The adapter uses a different label set for consume vs publish, matching -upstream verbatim: - -- Consume tags by `handler` (the subscriber) -- Publish tags by `destination` (the queue) +Operator queries that key off the recorder-side metrics emitted by +the Prometheus adapter. The `broker` label is always `"outbox"`; add +the filter to disambiguate from upstream FastStream services. ```promql # Handler throughput (acked / sec) @@ -142,174 +95,18 @@ rate(faststream_published_messages_total{broker="outbox",status="success"}[1m]) # P99 publish (INSERT) latency per queue histogram_quantile(0.99, rate(faststream_published_messages_duration_seconds_bucket{broker="outbox"}[5m])) -``` - -## OpenTelemetry adapter -Drop-in compatible with FastStream's `TelemetryMiddleware`, **meter only -— no spans** (see [Native middleware](#native-middleware-spans-bus-parity) -below if you need spans). - -```bash -pip install 'faststream-outbox[opentelemetry,prometheus]' \ - opentelemetry-exporter-prometheus uvicorn -``` - -```python -# app.py — run with `uvicorn app:app --host 0.0.0.0 --port 8000` -from faststream.asgi import AsgiFastStream -from opentelemetry import metrics -from opentelemetry.exporter.prometheus import PrometheusMetricReader -from opentelemetry.sdk.metrics import MeterProvider -from prometheus_client import REGISTRY, make_asgi_app -from sqlalchemy import MetaData -from sqlalchemy.ext.asyncio import create_async_engine - -from faststream_outbox import OutboxBroker, make_outbox_table -from faststream_outbox.metrics.opentelemetry import OpenTelemetryRecorder - - -# OTel meters → Prometheus reader (scraped at /metrics below) -prometheus_reader = PrometheusMetricReader() -meter_provider = MeterProvider(metric_readers=[prometheus_reader]) -metrics.set_meter_provider(meter_provider) - -metadata = MetaData() -outbox_table = make_outbox_table(metadata, table_name="outbox") -engine = create_async_engine("postgresql+asyncpg://outbox:outbox@localhost:5432/outbox") - -broker = OutboxBroker( - engine, - outbox_table=outbox_table, - metrics_recorder=OpenTelemetryRecorder(meter_provider=meter_provider), -) - - -@broker.subscriber("orders", max_workers=4) -async def handle_order(body: dict) -> None: ... - - -app = AsgiFastStream(broker, asgi_routes=[("/metrics", make_asgi_app(registry=REGISTRY))]) -``` - -The `PrometheusMetricReader` converts OTel meter data points to Prometheus -exposition format on `/metrics`; for OTLP push instead, swap the reader -for `PeriodicExportingMetricReader(OTLPMetricExporter(...))` and drop the -`/metrics` route. - -Instrument names (`messaging.process.duration`, -`messaging.publish.duration`, `messaging.process.messages` when -`include_messages_counters=True`), units, and constructor args -(`meter_provider`, `meter`, `include_messages_counters`) match -`faststream.opentelemetry.TelemetryMiddleware`. The -`messaging.system="outbox"` attribute disambiguates outbox traffic from -Kafka / Rabbit data on the same instruments. - -**Tracing (spans) is not modelled by this adapter** — the callable seam -can't bracket a span lifecycle. For spans, use the [native middleware -integration](#native-middleware-spans-bus-parity) below. - -## Native middleware (spans + bus parity) - -For OTel spans wrapping `consume_scope` / `publish_scope` and the exact -upstream label / instrument schema, register the native middleware -subclasses via `broker_middlewares=[...]` — same registration pattern as -`KafkaPrometheusMiddleware` / `RabbitTelemetryMiddleware`. - -The recommended setup pairs middleware with the recorder so every event -the bus emits **and** every outbox-internal event lands in one -observability stack: - -```bash -pip install 'faststream-outbox[opentelemetry,prometheus]' \ - opentelemetry-exporter-otlp opentelemetry-exporter-prometheus uvicorn -``` - -```python -# app.py — run with `OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 \ -# uvicorn app:app --host 0.0.0.0 --port 8000` -from faststream.asgi import AsgiFastStream -from opentelemetry import metrics, trace -from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter -from opentelemetry.exporter.prometheus import PrometheusMetricReader -from opentelemetry.sdk.metrics import MeterProvider -from opentelemetry.sdk.resources import Resource -from opentelemetry.sdk.trace import TracerProvider -from opentelemetry.sdk.trace.export import BatchSpanProcessor -from prometheus_client import REGISTRY, make_asgi_app -from sqlalchemy import MetaData -from sqlalchemy.ext.asyncio import create_async_engine - -from faststream_outbox import OutboxBroker, make_outbox_table -from faststream_outbox.metrics.prometheus import PrometheusRecorder -from faststream_outbox.opentelemetry import OutboxTelemetryMiddleware -from faststream_outbox.prometheus import OutboxPrometheusMiddleware - - -# ----- OTel SDK ----- -resource = Resource.create({"service.name": "my-outbox-service"}) -tracer_provider = TracerProvider(resource=resource) -tracer_provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter())) -trace.set_tracer_provider(tracer_provider) - -meter_provider = MeterProvider(resource=resource, metric_readers=[PrometheusMetricReader()]) -metrics.set_meter_provider(meter_provider) - -# ----- Outbox broker ----- -metadata = MetaData() -outbox_table = make_outbox_table(metadata, table_name="outbox") -engine = create_async_engine("postgresql+asyncpg://outbox:outbox@localhost:5432/outbox") - -broker = OutboxBroker( - engine, - outbox_table=outbox_table, - middlewares=[ - # Bus-scope spans + meters around consume_scope / publish_scope. - OutboxTelemetryMiddleware(tracer_provider=tracer_provider, meter_provider=meter_provider), - OutboxPrometheusMiddleware(registry=REGISTRY, app_name="my-outbox-service"), - ], - # Outbox-internal events (fetched, lease_lost, terminal reasons) that have - # no message context and can't reach the middleware bus. - metrics_recorder=PrometheusRecorder(registry=REGISTRY, app_name="my-outbox-service"), -) - - -@broker.subscriber("orders", max_workers=4) -async def handle_order(body: dict) -> None: ... - - -app = AsgiFastStream(broker, asgi_routes=[("/metrics", make_asgi_app(registry=REGISTRY))]) +# DLQ misconfiguration: terminal-failure rate diverges from DLQ-write rate +rate(faststream_outbox_nacked_terminal_total[5m]) + - +rate(faststream_outbox_dlq_written_total[5m]) + > 0 ``` -Traces flow to OTLP (Jaeger / Tempo / Honeycomb / collector); meters and -the recorder's outbox-internal counters land on `/metrics` for Prometheus -to scrape. One process, one ASGI app, one scrape endpoint. - -The providers set `messaging.system = "outbox"`, matching the recorder-seam -adapters. The OTel provider maps `row.id → messaging.message.id`, -`row.queue → messaging.destination_publish.name`, `correlation_id → -messaging.message.conversation_id`, `len(payload) → -messaging.message.payload_size_bytes`, and `len(cmd.batch_bodies) → -messaging.batch.message_count` when >1. - -## Layering: middleware seam vs. recorder seam - -Both can be registered together — each fires for events the other -physically cannot observe. - -| Concern | Middleware seam | Recorder seam | -|---|---|---| -| Handler duration / status / size | ✅ via `consume_scope` | ✅ via `acked` / `nacked_*` events | -| Publish duration / status / exception | ✅ via `publish_scope` | ✅ via `published` event | -| Span tracing (consume + publish) | ✅ | ❌ (callable can't bracket spans) | -| `fetched` ticks (including empty) | ❌ (no `StreamMessage` at fetch time) | ✅ | -| `lease_lost` after `consume_scope` exits | ❌ | ✅ | -| `nacked_terminal(reason="max_deliveries")` before consume opens | ❌ | ✅ | -| Empty-fetch idle counter | ❌ | ✅ | - -The recommended setup for full observability is **both seams together**: -middleware for bus-scope metrics + tracing, recorder for outbox-internal -events. +The first eight are direct ports of the recorder-side metrics into +operator-actionable PromQL. The last one is the +DLQ-misconfiguration-detection alert covered in [DLQ § Metric: +dlq_written](./dlq.md#metric-dlq_written). ## Test broker note From 02c83edf96c47c1c966dced0fe96768510b2798b Mon Sep 17 00:00:00 2001 From: Artur Shiriev Date: Thu, 11 Jun 2026 17:11:13 +0300 Subject: [PATCH 6/7] =?UTF-8?q?docs:=20nav=20reshape=20=E2=80=94=20surface?= =?UTF-8?q?=20tutorials,=20setup=20how-to,=20instrumentation=20explanation?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two tutorial stubs land in docs/tutorials/ so the nav resolves; content is filled by the following per-tutorial commits. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/tutorials/add-kafka-relay.md | 1 + docs/tutorials/first-outbox-app.md | 1 + mkdocs.yml | 4 ++++ 3 files changed, 6 insertions(+) create mode 100644 docs/tutorials/add-kafka-relay.md create mode 100644 docs/tutorials/first-outbox-app.md diff --git a/docs/tutorials/add-kafka-relay.md b/docs/tutorials/add-kafka-relay.md new file mode 100644 index 0000000..fc6d686 --- /dev/null +++ b/docs/tutorials/add-kafka-relay.md @@ -0,0 +1 @@ +# Tutorial: Add a Kafka relay diff --git a/docs/tutorials/first-outbox-app.md b/docs/tutorials/first-outbox-app.md new file mode 100644 index 0000000..db914d2 --- /dev/null +++ b/docs/tutorials/first-outbox-app.md @@ -0,0 +1 @@ +# Tutorial: Your first outbox app diff --git a/mkdocs.yml b/mkdocs.yml index 5244266..c677854 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -8,15 +8,19 @@ nav: - Getting started: - Installation: introduction/installation.md - Basic usage: usage/basic.md + - 'Tutorial: Your first outbox app': tutorials/first-outbox-app.md + - 'Tutorial: Add a Kafka relay': tutorials/add-kafka-relay.md - Concepts: - How it works: introduction/how-it-works.md - Comparison: concepts/comparison.md + - Instrumentation seams: concepts/instrumentation-seams.md - Guides: - FastAPI integration: usage/fastapi.md - Relay to Kafka / RabbitMQ / NATS: usage/relay.md - Timers: usage/timers.md - Testing: usage/testing.md - Schema validation: usage/schema-validation.md + - Setup Prometheus and OpenTelemetry: usage/setup-prometheus-opentelemetry.md - Reference: - Subscriber: usage/subscriber.md - Publisher: usage/publisher.md From 265dc6abe847bc423f75ca91dff137bd5c74d6c7 Mon Sep 17 00:00:00 2001 From: Artur Shiriev Date: Fri, 12 Jun 2026 07:19:10 +0300 Subject: [PATCH 7/7] docs: defer tutorials to follow-on PR Scope reduction for this PR: the observability split (Tasks 4-7) ships now; the two tutorials (Tasks 2-3) defer to a follow-on PR. Reason: the spec's "tutorial code executed end-to-end with literal captured output" discipline warrants a dedicated session, not a rushed attempt at the tail of this one. Removed from working tree: docs/tutorials/{first-outbox-app, add-kafka-relay}.md (which were stubs added in the prior nav-reshape commit) and the two Tutorial nav entries under Getting started. Spec gets a "Scope reduction" callout at the top; plan T2 + T3 are marked DEFERRED with a back-reference to the spec note. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/tutorials/add-kafka-relay.md | 1 - docs/tutorials/first-outbox-app.md | 1 - mkdocs.yml | 2 -- ...1-docs-tutorials-and-observability-split-design.md | 9 +++++++++ ...-11-docs-tutorials-and-observability-split-plan.md | 11 +++++++++-- 5 files changed, 18 insertions(+), 6 deletions(-) delete mode 100644 docs/tutorials/add-kafka-relay.md delete mode 100644 docs/tutorials/first-outbox-app.md diff --git a/docs/tutorials/add-kafka-relay.md b/docs/tutorials/add-kafka-relay.md deleted file mode 100644 index fc6d686..0000000 --- a/docs/tutorials/add-kafka-relay.md +++ /dev/null @@ -1 +0,0 @@ -# Tutorial: Add a Kafka relay diff --git a/docs/tutorials/first-outbox-app.md b/docs/tutorials/first-outbox-app.md deleted file mode 100644 index db914d2..0000000 --- a/docs/tutorials/first-outbox-app.md +++ /dev/null @@ -1 +0,0 @@ -# Tutorial: Your first outbox app diff --git a/mkdocs.yml b/mkdocs.yml index c677854..52db0ca 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -8,8 +8,6 @@ nav: - Getting started: - Installation: introduction/installation.md - Basic usage: usage/basic.md - - 'Tutorial: Your first outbox app': tutorials/first-outbox-app.md - - 'Tutorial: Add a Kafka relay': tutorials/add-kafka-relay.md - Concepts: - How it works: introduction/how-it-works.md - Comparison: concepts/comparison.md diff --git a/planning/active/2026-06-11-docs-tutorials-and-observability-split-design.md b/planning/active/2026-06-11-docs-tutorials-and-observability-split-design.md index b449a38..e434d13 100644 --- a/planning/active/2026-06-11-docs-tutorials-and-observability-split-design.md +++ b/planning/active/2026-06-11-docs-tutorials-and-observability-split-design.md @@ -10,6 +10,15 @@ outcome: null # Design: Add two tutorials and split observability.md +> **Scope reduction (2026-06-12).** The implementing PR shipped only +> the observability split (§3 plus the supporting nav reshape and +> cross-links). The two tutorials (§1, §2) are deferred to a +> follow-on spec — the spec's discipline that tutorial code must be +> executed end-to-end against a clean environment with literal output +> captured warrants a dedicated session. The structural pieces stand +> on their own and were unblocked first. See the plan's Tasks 2 and 3 +> deferral notes. + ## Summary Pull-the-piece-that-matters F-min from the diff --git a/planning/active/2026-06-11-docs-tutorials-and-observability-split-plan.md b/planning/active/2026-06-11-docs-tutorials-and-observability-split-plan.md index e63cac4..4c4da66 100644 --- a/planning/active/2026-06-11-docs-tutorials-and-observability-split-plan.md +++ b/planning/active/2026-06-11-docs-tutorials-and-observability-split-plan.md @@ -70,7 +70,12 @@ environment before committing. --- -### Task 2: Tutorial 1 — Your first outbox app +### Task 2: Tutorial 1 — Your first outbox app — DEFERRED + +> **Deferred to follow-on PR (2026-06-12).** Spec §"Scope reduction" +> note explains: tutorial execution discipline (clean env, literal +> captured output) warrants a dedicated session. Structural work +> (T4–T7) shipped without this task. **Files:** - Create: `docs/tutorials/first-outbox-app.md` @@ -135,7 +140,9 @@ environment.** Capture the literal terminal output at each step. --- -### Task 3: Tutorial 2 — Add a Kafka relay +### Task 3: Tutorial 2 — Add a Kafka relay — DEFERRED + +> **Deferred to follow-on PR (2026-06-12).** Same rationale as Task 2. **Files:** - Create: `docs/tutorials/add-kafka-relay.md`