docs: operator pages (checklist, troubleshooting, alembic migrations)#53
Merged
Conversation
Three new pages under a new docs/operations/ section: Production checklist (sizing/subscribers/DLQ/drain/schema/observability scaffold linking into existing references), Troubleshooting playbook (symptom → likely cause → diagnose → fix → reference, eleven symptoms), Alembic migrations (literal autogenerate output + DLQ-addition-as- second-migration + drift detection + partition-retention recipe). Spec captures the design and structural calls; plan walks an executor through eight tasks starting with an Alembic autogenerate spike to ground the literal-sample claims. The B follow-on from #50 (docs-landing-and-comparison). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ran autogenerate against an empty Postgres 17 schema with make_outbox_table() (and then with make_dlq_table() added) using SQLAlchemy 2.0.50 + Alembic 1.18.4. The captured ops are pasted verbatim into spec §4a and §4b under "Captured autogenerate output". Verified against §4a's prediction: three partial indexes with predicates acquired_token IS NULL / IS NOT NULL / timer_id IS NOT NULL — all match. Two book-keeping columns the user-facing reference pages don't name explicitly (attempts_count, first_attempt_at, last_attempt_at) are flagged in a note so the page author transcribes them faithfully without editorializing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three stub pages under docs/operations/ (checklist.md, troubleshooting.md, alembic.md) plus a fifth Operations section in mkdocs nav. Content lands in follow-up commits per the plan. Each stub is just the H1 — mkdocs build --strict is clean because the nav refs resolve, and the rendered sidebar shows the new section in place even before content arrives. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sixteen items across six sections (Sizing, Subscribers, DLQ, Drain & lifecycle, Schema, Observability). Each item is one to two lines plus a relative link into the existing reference page that owns the underlying detail — checklist is link-scaffold, not new prose. Two link anchors into operations/troubleshooting.md and operations/alembic.md don't resolve yet; T5 and T6 land those pages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Eleven symptom → cause → diagnose → fix → reference subsections, plus a TOC table at the top with anchor jumps. Symptoms range from operator-actionable (event=lease_lost, outbox growth patterns, duplicate invocations) through "by design" surprises that need a "don't fight this; here's what to do instead" explanation (ACK_FIRST raises, OutboxResponse + foreign publisher nacked). Each section uses explicit attr_list anchors so the TOC jumps are stable across mkdocs slugifier behavior — backtick + equals + slash in heading content otherwise collapse unpredictably. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four sections (initial migration with the captured autogenerate output annotated for load-bearing partial indexes, adding the DLQ as an additive second migration, drift detection in CI via validate_schema, and DLQ retention via range-partition rotation). The DLQ retention walkthrough includes the one-time conversion to PARTITION BY RANGE (failed_at) and a monthly cron snippet that create-next + drop-oldest. Both are O(1) regardless of DLQ row count and replace the row-by-row DELETE pattern at higher volume. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
One new decision-tree row in docs/index.md (Deploy to production safely → Production checklist) and five one-line italic callouts into the operator pages from existing reference pages: - subscriber.md § Connection budget → Production checklist § Sizing - subscriber.md § Slow handlers — dedicated queue → Troubleshooting § event=lease_lost - dlq.md § Metric: dlq_written → Production checklist § DLQ - dlq.md § Retention → Alembic migrations § DLQ retention via partition drop - schema-validation.md § Where to call it → Alembic migrations § Drift detection in CI Same italic + en-dash style the docs-landing-and-comparison PR (#50) established for the Comparison callouts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 tasks
lesnik512
added a commit
that referenced
this pull request
Jun 11, 2026
chore: archive shipped planning pair from #53
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The B follow-on from #50 (docs-landing-and-comparison). Adds three new pages under a new
docs/operations/directory + a fifth top-level Operations nav section, plus five one-line cross-link callouts from existing reference pages.Summary
docs/operations/checklist.md— sixteen items across six sections (Sizing / Subscribers / DLQ / Drain & lifecycle / Schema / Observability). Each item is one to two lines plus a link into the existing reference page that owns the underlying detail. Checklist is link-scaffold, not new prose — operators scan, then drill in.docs/operations/troubleshooting.md— eleven symptoms in TOC-table form, each with Symptom / Likely cause / Diagnose / Fix / Reference. Covers operator-actionable signals (event=lease_lost, outbox growth patterns, idle latency, duplicate invocations, rolling-deploy leakage) plus by-design surprises (ACK_FIRSTraises,OutboxResponse + foreign publishernacked,validate_schemaImportError). Explicitattr_listanchors so the TOC jumps stay stable across mkdocs slugifier behavior.docs/operations/alembic.md— whatalembic revision --autogenerateactually produces againstmake_outbox_table()(verbatim, captured during the spec phase via aproduce_migrations+render_python_codespike against an empty Postgres 17 / SQLAlchemy 2.0.50 / Alembic 1.18.4), plus the additive DLQ second migration, a CI drift-detection recipe usingvalidate_schema(), and a step-by-step DLQ retention recipe converting to range-partitioned-by-failed_atwith a monthly cron for create-next / drop-oldest.mkdocs.yml— new fifth Operations section in nav.docs/index.md— new decision-tree row: "Deploy to production safely → Production checklist".docs/usage/subscriber.md,docs/usage/dlq.md, anddocs/usage/schema-validation.mdpointing into the relevant operator-page sections. Same italic style the Comparison callouts (docs: rewrite landing page, reshape nav, add Comparison page #50) use.Spec + plan:
planning/active/2026-06-11-operator-pages-{design,plan}.md.Test plan
just docs-build(mkdocs build --strict) passes clean — no anchor warnings, no orphaned pagesjust lintpasses (eof-fixer, ruff format, ruff check, ty check)acquired_token IS NULL/acquired_token IS NOT NULL/timer_id IS NOT NULL, all matchjust docs-serveand confirm the sidebar shows five sections in order: Overview → Getting started → Concepts → Guides → Reference → Operations##heading on click for every row (eleven symptoms)Out of scope (per spec § Non-goals)
tests/integration.py— would pin the autogenerate output against drift; strong follow-up but adds test-suite scope.planning/architecture/into user-facing docs (separate spec)🤖 Generated with Claude Code