From 5fbdff44adf575ac2d21be2ab309e2cfb54f381f Mon Sep 17 00:00:00 2001 From: aa638 Date: Wed, 10 Jun 2026 10:17:04 +0200 Subject: [PATCH] feat: detect manifest drift in the status skill status now compares k8s// files against the cluster with oc diff (server-side dry-run, still read-only), filters diff noise, and reports one plain-language line per app. Reconciliation is offered, never automatic, and asks which side is right before acting. Skipped silently when no local manifests exist. Co-Authored-By: Claude Fable 5 --- README.md | 4 +- skills/status/SKILL.md | 51 ++++++++++++++++++++-- skills/status/references/usage-examples.md | 43 +++++++++++++++++- 3 files changed, 90 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index 8eb611c..d3e3fe0 100644 --- a/README.md +++ b/README.md @@ -79,7 +79,7 @@ Eight composable skills. All are user-invocable, so you can ask for any of them A few intentional choices that shape how the plugin behaves: - **One cluster, many namespaces.** Each app gets its own namespace on a shared cluster — no per-app cluster sprawl. -- **Manifests in your repo, applied directly.** No GitOps, no ArgoCD. `k8s//*.yaml` is the source of truth. +- **Manifests in your repo, applied directly.** No GitOps, no ArgoCD. `k8s//*.yaml` is the source of truth — and `status` checks the cluster against those files and flags drift. When keeping that in sync by hand starts hurting, that's the signal to graduate to GitOps. - **Safe by default.** `internal` gateway is the default; `public` requires an explicit two-step security confirmation. - **Detect, don't assume.** Every skill queries the cluster (`oc whoami`, `indev cluster list`, …) at the start instead of carrying hidden state. - **Friendly, not lecturing.** Show one command at a time. Introduce a term *after* you've shown what it does. @@ -91,7 +91,7 @@ A few things that aren't obvious up front: - **First run = clicking "Allow" a lot.** Claude Code asks permission the first time it runs each command. Allow them once and future sessions are quiet. - **Tokens expire.** Both `indev` and `oc` log out after a few hours. If something fails with "Unauthorized", just say *"log me back in"*. - **Different repo, different view.** Claude only sees the manifests in the directory you're working from. Switching repos means switching scope. -- **`update-image` keeps the local file in sync** — but only if you run it from the repo that contains the manifests. Otherwise, the YAML on disk will drift from the cluster. +- **`update-image` keeps the local file in sync** — but only if you run it from the repo that contains the manifests. Otherwise, the YAML on disk will drift from the cluster. Run `status` from that repo and it will spot the drift and offer to fix it. - **macOS / Linux only for now.** The skills assume a POSIX shell. On Windows, use WSL. ## Found a bug? Have a wish? diff --git a/skills/status/SKILL.md b/skills/status/SKILL.md index e8130f0..1808050 100644 --- a/skills/status/SKILL.md +++ b/skills/status/SKILL.md @@ -5,7 +5,9 @@ allowed-tools: - Bash(indev cluster list*) - Bash(oc whoami*) - Bash(oc get*) + - Bash(oc diff*) - Bash(grep *) + - Glob --- # Status @@ -14,7 +16,7 @@ allowed-tools: Give the user a clear one-screen summary of: what cluster they have, what apps are deployed, and what URLs those apps respond on. Useful when coming back to the platform after a break, or just to remember what's where. -This skill is **read-only**. It changes nothing. It also persists nothing — it just queries. +This skill is **read-only**. It changes nothing. It also persists nothing — it just queries. (`oc diff` in Step 5 is a server-side dry-run — it compares, it doesn't apply.) ## If `oc` returns "Unauthorized" mid-flow @@ -72,7 +74,30 @@ oc get httproute --all-namespaces \ Match each route to its deployment by namespace. -## Step 5 — Present +## Step 5 — Drift check (only when local manifests exist) + +The manifests in `k8s//` are supposed to be the source of truth — but the cluster can wander (someone resized in the portal, ran `oc set image` from another machine, edited live). Catch that here, quietly. + +First, look for managed manifests with the Glob tool (`k8s/*/*.yaml`). **If there's no `k8s/` directory here, skip this entire step silently** — no "drift not checked" disclaimers, just move on. The user may simply be in a different repo. + +For each `k8s//` directory whose name matches a deployed namespace: + +```bash +oc diff -f k8s// +``` + +Exit code 0 = in sync. Exit code 1 = something differs. Anything else = couldn't check that app — skip it quietly. + +Reading the diff — **filter the noise first**. These differences alone do NOT count as drift: + +- `metadata.generation` +- `creationTimestamp` +- `kubectl.kubernetes.io/last-applied-configuration` annotations +- anything under `status:` + +If only noise remains, the app is in sync. For real differences, translate each into one plain phrase a human cares about: "cluster runs :v3, file says :v2", "replicas 3 on cluster, 1 in file", "cluster has an env var the file doesn't". Never paste the raw diff into the summary — the user can ask for it. + +## Step 6 — Present Show one block per app. Keep it tight — this is a glance, not a report. @@ -84,13 +109,17 @@ Apps: │ image: ghcr.io/me/myapp:v2 │ status: 1/1 running │ URL: http://myapp.apps.example.com (internal — only reachable on your organization's network) + │ files: ✓ in sync with k8s/myapp/ │ ┌─ api (namespace: api) │ image: ghcr.io/me/api:latest │ status: 2/2 running │ URL: (not exposed — run expose-app to give it a URL) + │ files: ⚠ drift — cluster runs 2 replicas, k8s/api/deployment.yaml says 1 ``` +Only include the `files:` line when Step 5 actually ran for that app. No local manifests → no line, no apology. + Status notation: - `1/1 running` → ready - `0/1 running` → broken — also surface a one-liner: "may be crashing — try `oc logs deployment/ -n `" @@ -102,14 +131,24 @@ For the gateway label, derive it from the parentRef name: If you see a `public` route the user may have set up earlier and forgotten about, mention it explicitly at the end of the report: "FYI, `` is on the public gateway — make sure that's still intended." -## Step 6 — Tell them what's next +## Step 7 — Tell them what's next End with two-to-three relevant follow-ups based on what you saw. Examples: - If an app shows `0/1 running`: "Want me to look at the logs for ``?" - If an app has no URL: "Want me to expose `` on a URL?" +- If an app drifted: "The cluster and your files disagree for `` — want me to bring them back in sync?" - If everything looks healthy: "Anything you'd like to update or add?" +### Reconciling drift (when they say yes) + +This happens *after* the report, with the user's go-ahead — never automatically. Ask one question: which side is right? + +- **The file is right** (someone fiddled with the cluster): `oc apply -f k8s//` puts it back. This is the normal answer — the repo is the source of truth. +- **The cluster is right** (the change was intentional, e.g. someone resized on purpose): edit the local YAML to match instead, like `update-image` does. Nothing is applied; the file just catches up. + +If they're unsure which, show them the relevant diff lines and let them decide. Don't pick for them — reverting someone's intentional change is worse than living with drift for another day. + Don't bullet a long menu — pick the most obvious one or two. ## What not to do @@ -118,6 +157,9 @@ Don't bullet a long menu — pick the most obvious one or two. - Don't list pods, configmaps, secrets, services, networkpolicies. The user wants apps + URLs, not raw Kubernetes object soup. - Don't include platform-namespace stuff (`openshift-*`, `kube-*`, `envoy-gateway-system`, `argocd`, `olm`, `hypershift`). It's noise to them. - Don't try to derive resource usage / CPU / memory. Out of scope. +- Don't paste raw `oc diff` output into the summary — translate it to one plain phrase per difference. +- Don't mention drift at all when there are no local manifests to compare against. +- Don't reconcile drift inside this skill — report it, offer, and act only on the user's answer. ## Quick reference @@ -126,8 +168,9 @@ indev cluster list oc whoami oc get deployments -A oc get httproute -A +oc diff -f k8s// # exit 0 = in sync, 1 = drift ``` ## Examples -See [references/usage-examples.md](references/usage-examples.md) for typical conversations (healthy cluster, broken app, public-route warning, logged out, nothing deployed). +See [references/usage-examples.md](references/usage-examples.md) for typical conversations (healthy cluster, broken app, public-route warning, drifted manifests, logged out, nothing deployed). diff --git a/skills/status/references/usage-examples.md b/skills/status/references/usage-examples.md index 608f029..3c33ade 100644 --- a/skills/status/references/usage-examples.md +++ b/skills/status/references/usage-examples.md @@ -68,7 +68,46 @@ Claude: Cluster: shop-a3k9x2 --- -## Example 4 — Logged out +## Example 4 — Manifests drifted from the cluster + +User is in the repo containing `k8s/`. Someone scaled the app in the portal last week. + +``` +You: What do I have running? + +Claude: [indev cluster list → shop-a3k9x2] + [oc whoami → ok] + [oc get deployments -A, oc get httproute -A] + [Glob k8s/*/*.yaml → k8s/shop/] + [oc diff -f k8s/shop/ → exit 1: spec.replicas 1 → 3] + + Cluster: shop-a3k9x2 (logged in as you@intility.no) + + Apps: + ┌─ shop (namespace: shop) + │ image: ghcr.io/me/shop:v2 + │ status: 3/3 running + │ URL: http://shop.apps.example.com (internal) + │ files: ⚠ drift — cluster runs 3 replicas, k8s/shop/deployment.yaml says 1 + + The cluster and your files disagree for shop — want me to bring + them back in sync? + +You: huh, yes + +Claude: Which side is right? + - The file (1 replica) — someone changed the cluster and it should go back + - The cluster (3 replicas) — the scale-up was intentional, update the file + +You: we scaled it up on purpose + +Claude: [edits k8s/shop/deployment.yaml: replicas: 1 → 3] + Done — the file now matches the cluster. Nothing was applied. +``` + +--- + +## Example 5 — Logged out ``` You: What's running? @@ -82,7 +121,7 @@ Claude: [indev cluster list → shop-a3k9x2] --- -## Example 5 — Nothing deployed yet +## Example 6 — Nothing deployed yet ``` You: Show me what I've got.