infrastructure-patterns

Sanitized, generalized write-ups of infrastructure patterns I've designed and operated in production - distilled into Architecture Decision Records (ADRs) so the reasoning is visible even where the code can't be.

Most of this comes out of running a multi-portal PHP/MySQL reservations platform plus its surrounding tooling (a Python agent-orchestration host, per-tenant containerized AI sandboxes, a server-side deploy pipeline) as its lead engineer. Those repositories are private; these patterns are the parts that generalize, written at the architecture level - no hostnames, addresses, credentials, or vendor specifics.

Most of these patterns cluster on one seam: safely running autonomous AI agents against revenue-critical legacy systems. Per-tenant isolation (0001), snapshot-fed writable sandboxes (0003), a scoped agent identity (0005), and default-deny, host-pinned data access (0009) are four boundaries around the same problem: give an agent real production data and real reach without giving it the ability to damage the business. ADR 0010 covers the legacy-modernization case itself - proving an agent's edits to opaque, generated markup moved nothing visible before a human is asked to approve.

Why ADRs

Each decision below is one where a non-obvious trade-off was made and can be defended. That's the useful unit: not "what I used," but "what I chose, what I gave up, and when I'd choose differently."

Decision records

#	Decision	Trade-off in one line
0001	Docker over bare-metal for per-tenant isolation	Pay image/ops overhead to get strong filesystem + DB-user isolation cheaply
0002	External managed DB over a containerized one	Give up "one compose up" simplicity for durable, backup-friendly state
0003	Periodic snapshot over live replication for sandboxes	Accept some staleness to gain isolation, reset-ability, and no prod write-path risk
0004	A guarded shell deploy over a hosted CI runner	Forgo ecosystem features for a dependency-free, auditable single-server deploy
0005	A scoped system user over a shared service account for an autonomous agent	More host setup in exchange for clean per-action auditing and least privilege
0006	Binlog-tailing daemons over database triggers for denormalization	Accept eventual consistency to keep derive-logic in versioned code, off the hot write path
0007	Pull-based health probing over push agents for a small fleet	Forgo deep metrics/history to keep the monitored fleet agent-free and the failure domain legible
0008	Embedded SQLite over a networked DB for single-node tooling	Give up cross-host sharing for zero operational surface on state one process owns
0009	Default-deny, host-pinned DB access over a trusted network	Take on provisioning friction so a leaked credential isn't portable off its host
0010	A pixel-equality gate over diff review for changes to generated markup	Pay a render/diff harness to safely change markup you can't audit by eye - it proves visual, not semantic, equality
0011	A pull-based metrics stack over a bespoke prober, data plane kept private	Take on a TSDB to run for real metrics/history/alerting - and bind the unauthenticated parts to loopback, exposing only one read-only pane
0012	Per-app copy-deployed sync services over a shared multi-tenant backend	Run N near-identical small services to gain physical blast-radius isolation and per-app sync-model freedom, at the cost of hand-applying shared-auth fixes across copies
0013	Staged declarative provisioning (Terraform + cloud-init + separate deploy) over one imperative bootstrap script	Take on Terraform state and a deliberately create-only provisioning token for a single box to get a reproducible, reviewable host and a clean provision/configure/deploy seam
0014	Import ~220 live DNS records into Terraform over recreating them from a desired-state list	Accept verbose generated config and provider quirks to adopt traffic-serving records with zero downtime and a no-op baseline plan, instead of risking duplicate-creates and silent deletion of forgotten records
0015	Keep Terraform state on a different provider than the compute it provisions	Take on a second vendor + scoped IAM key so a provider-level outage can't destroy both the infrastructure and the state needed to rebuild it
0016	Enforce cluster posture at admission (OPA/Gatekeeper + a VAP) over trusting reviewed manifests	Run a policy controller so the hardened posture is rejected-if-violated at the API server instead of relying on review - the control that matters once a second actor or an agent can apply to the cluster
0017	Self-host the signing instrument and its audit trail over a SaaS signature service	Take on one container plus its own cert and upkeep so the legally-binding document and its audit trail stay on infra you govern, with a named-human gate on every consequential action - the highest-stakes case of keeping the authoritative copy where you control it
0018	Runtime-injected secrets over plaintext config, with the store matched to the team	Hold one invariant (encrypted at rest, injected into the environment at runtime) and pay for two backends - a broker where a team needs sharing/revocation/audit, SOPS + age where a solo fleet needs offline zero-vendor recovery - rather than force one tool to fit both
0019	Reuse an existing passkey session to gate a second app (Caddy forward_auth) over standing up a second auth stack	Gate a second internal app by delegating to one hardened passkey session via Caddy forward_auth - ~40 lines of Caddyfile, zero new auth code, every tool inheriting the gate's future hardening - at the cost of concentrating trust in a single session
0020	A private mesh for operator shells, public MFA-gated endpoints for browser consoles, over one VPN for everything	Put SSH behind a WireGuard mesh and drop public :22, but keep browser admin consoles public behind per-audience MFA - size the boundary to who uses it, rather than hide the consoles non-technical staff reach by URL behind a VPN client they can't maintain

A concrete, sanitized companion to ADR 0011 lives in observability/: the Prometheus/Alertmanager config and Grafana dashboards that make the pattern reproducible. A companion to ADRs 0013-0015 - the full Terraform/Cloudflare/Ansible DNS-as-code repo, sanitized - lives at terraform-cloudflare-dns. A companion to ADR 0016 - the live k3s deployment whose posture the policy set enforces - lives at k3s-demo (k3s-demo.stephens.page), with the manifests under policy/.

Format

Each ADR uses a short, consistent shape: Context → Decision → Consequences → When I'd revisit. They're deliberately terse.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
adr		adr
observability		observability
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

infrastructure-patterns

Why ADRs

Decision records

Format

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

infrastructure-patterns

Why ADRs

Decision records

Format

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages