Skip to content

JacobStephens2/infrastructure-patterns

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 

Repository files navigation

infrastructure-patterns

Sanitized, generalized write-ups of infrastructure patterns I've designed and operated in production - distilled into Architecture Decision Records (ADRs) so the reasoning is visible even where the code can't be.

Most of this comes out of running a multi-portal PHP/MySQL reservations platform plus its surrounding tooling (a Python agent-orchestration host, per-tenant containerized AI sandboxes, a server-side deploy pipeline) as its lead engineer. Those repositories are private; these patterns are the parts that generalize, written at the architecture level - no hostnames, addresses, credentials, or vendor specifics.

Most of these patterns cluster on one seam: safely running autonomous AI agents against revenue-critical legacy systems. Per-tenant isolation (0001), snapshot-fed writable sandboxes (0003), a scoped agent identity (0005), and default-deny, host-pinned data access (0009) are four boundaries around the same problem: give an agent real production data and real reach without giving it the ability to damage the business. ADR 0010 covers the legacy-modernization case itself - proving an agent's edits to opaque, generated markup moved nothing visible before a human is asked to approve.

Why ADRs

Each decision below is one where a non-obvious trade-off was made and can be defended. That's the useful unit: not "what I used," but "what I chose, what I gave up, and when I'd choose differently."

Decision records

# Decision Trade-off in one line
0001 Docker over bare-metal for per-tenant isolation Pay image/ops overhead to get strong filesystem + DB-user isolation cheaply
0002 External managed DB over a containerized one Give up "one compose up" simplicity for durable, backup-friendly state
0003 Periodic snapshot over live replication for sandboxes Accept some staleness to gain isolation, reset-ability, and no prod write-path risk
0004 A guarded shell deploy over a hosted CI runner Forgo ecosystem features for a dependency-free, auditable single-server deploy
0005 A scoped system user over a shared service account for an autonomous agent More host setup in exchange for clean per-action auditing and least privilege
0006 Binlog-tailing daemons over database triggers for denormalization Accept eventual consistency to keep derive-logic in versioned code, off the hot write path
0007 Pull-based health probing over push agents for a small fleet Forgo deep metrics/history to keep the monitored fleet agent-free and the failure domain legible
0008 Embedded SQLite over a networked DB for single-node tooling Give up cross-host sharing for zero operational surface on state one process owns
0009 Default-deny, host-pinned DB access over a trusted network Take on provisioning friction so a leaked credential isn't portable off its host
0010 A pixel-equality gate over diff review for changes to generated markup Pay a render/diff harness to safely change markup you can't audit by eye - it proves visual, not semantic, equality
0011 A pull-based metrics stack over a bespoke prober, data plane kept private Take on a TSDB to run for real metrics/history/alerting - and bind the unauthenticated parts to loopback, exposing only one read-only pane
0012 Per-app copy-deployed sync services over a shared multi-tenant backend Run N near-identical small services to gain physical blast-radius isolation and per-app sync-model freedom, at the cost of hand-applying shared-auth fixes across copies
0013 Staged declarative provisioning (Terraform + cloud-init + separate deploy) over one imperative bootstrap script Take on Terraform state and a deliberately create-only provisioning token for a single box to get a reproducible, reviewable host and a clean provision/configure/deploy seam
0014 Import ~220 live DNS records into Terraform over recreating them from a desired-state list Accept verbose generated config and provider quirks to adopt traffic-serving records with zero downtime and a no-op baseline plan, instead of risking duplicate-creates and silent deletion of forgotten records
0015 Keep Terraform state on a different provider than the compute it provisions Take on a second vendor + scoped IAM key so a provider-level outage can't destroy both the infrastructure and the state needed to rebuild it
0016 Enforce cluster posture at admission (OPA/Gatekeeper + a VAP) over trusting reviewed manifests Run a policy controller so the hardened posture is rejected-if-violated at the API server instead of relying on review - the control that matters once a second actor or an agent can apply to the cluster
0017 Self-host the signing instrument and its audit trail over a SaaS signature service Take on one container plus its own cert and upkeep so the legally-binding document and its audit trail stay on infra you govern, with a named-human gate on every consequential action - the highest-stakes case of keeping the authoritative copy where you control it
0018 Runtime-injected secrets over plaintext config, with the store matched to the team Hold one invariant (encrypted at rest, injected into the environment at runtime) and pay for two backends - a broker where a team needs sharing/revocation/audit, SOPS + age where a solo fleet needs offline zero-vendor recovery - rather than force one tool to fit both
0019 Reuse an existing passkey session to gate a second app (Caddy forward_auth) over standing up a second auth stack Gate a second internal app by delegating to one hardened passkey session via Caddy forward_auth - ~40 lines of Caddyfile, zero new auth code, every tool inheriting the gate's future hardening - at the cost of concentrating trust in a single session
0020 A private mesh for operator shells, public MFA-gated endpoints for browser consoles, over one VPN for everything Put SSH behind a WireGuard mesh and drop public :22, but keep browser admin consoles public behind per-audience MFA - size the boundary to who uses it, rather than hide the consoles non-technical staff reach by URL behind a VPN client they can't maintain

A concrete, sanitized companion to ADR 0011 lives in observability/: the Prometheus/Alertmanager config and Grafana dashboards that make the pattern reproducible. A companion to ADRs 0013-0015 - the full Terraform/Cloudflare/Ansible DNS-as-code repo, sanitized - lives at terraform-cloudflare-dns. A companion to ADR 0016 - the live k3s deployment whose posture the policy set enforces - lives at k3s-demo (k3s-demo.stephens.page), with the manifests under policy/.

Format

Each ADR uses a short, consistent shape: Context → Decision → Consequences → When I'd revisit. They're deliberately terse.

Releases

No releases published

Packages

 
 
 

Contributors