Skip to content

Harden CLI: credential sources, retries, destructive-op gates, truthful help (DX-5647)#12

Merged
johnpmitsch merged 5 commits into
mainfrom
dx-5647-harden-qn-cli-credential-sources-retries-destructive-op
Jun 10, 2026
Merged

Harden CLI: credential sources, retries, destructive-op gates, truthful help (DX-5647)#12
johnpmitsch merged 5 commits into
mainfrom
dx-5647-harden-qn-cli-credential-sources-retries-destructive-op

Conversation

@johnpmitsch

@johnpmitsch johnpmitsch commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Closes DX-5647.

Foundational hardening in four commits, one per area:

1. Credential source simplification

QN_CLI__API_KEY is removed. A key left exported in a shell is invisible state that outlives its session — the easiest way to point a destructive command at the wrong account. Key sources are now, highest precedence first:

  1. --api-key <KEY>
  2. --config-file <PATH> (new global flag, points at an alternate config TOML)
  3. ~/.config/qn/config.toml via qn auth login

auth login/logout/status/whoami honor --config-file for reads and writes.

2. Retries for read-only commands

Read-only commands (list, show, logs, metrics, usage, …) now retry transient failures (HTTP 429/500/502/503/504, timeouts, connect errors) with full-jitter exponential backoff: base 500ms, 8s cap, 3 retries by default, tunable via the global --retries <N> (0 disables). Commands that modify resources never retry automatically — a retried create could provision twice. Tests prove a failing POST hits the server exactly once.

3. Destructive-operation safety

  • qn stream delete-all and qn webhook delete-all are removed. An account-wide wipe is a disproportionate blast radius for a CLI one-liner; that operation belongs behind the API.
  • Newly confirmation-gated (Mild: --yes skips, TTY prompts, scripts exit 5 before any request is sent): endpoint bulk pause (prompt includes the count), endpoint security token delete, endpoint rate-limit delete-override.
  • Prompts now state the blast radius ("Pause 3 endpoint(s)? They will stop serving requests"); bulk resume stays ungated since it restores service.
  • The severity policy is written into CLAUDE.md.

4. Truthful help & exit codes

  • Required flags are clap-enforced and appear in usage lines: endpoint create --chain --network, endpoint update --label, and the six stream create flags via required_unless_present(stream_config_file). Missing flags are reported all at once.
  • clap usage errors (typo'd subcommand, missing flag) exit 1, matching runtime argument errors — exit 2 now always and only means "the API returned an error". --help/--version still exit 0.
  • Global flags are grouped under a "Global options" heading so command-specific flags surface first in every --help.
  • after_help examples on the top level and on endpoint/stream/webhook/kv/auth.

Verification

Each commit passed cargo test (212 tests, including new wiremock coverage for retries, --config-file auth, and every confirmation gate), cargo clippy --all-targets -- -D warnings, cargo fmt --check, and a clean release build. Help/exit-code behavior was also verified against the built binary in a real shell.

Breaking changes

  • QN_CLI__API_KEY no longer authenticates; CI callers should write a config file and pass --config-file, or pass --api-key.
  • stream delete-all / webhook delete-all no longer exist.
  • Usage errors exit 1 (previously 2).

API keys now come from exactly two sources, highest precedence first:
the --api-key flag, then the config file (--config-file path if given,
else ~/.config/qn/config.toml). The environment variable was removed
deliberately: a key left exported in a shell is invisible state that
outlives the session it was set for, and is the easiest way to run a
destructive command against the wrong account. CI callers should write
a config file and point --config-file at it.

auth login/logout/status/whoami honor --config-file for both reads and
writes. The [output] section is also read from the --config-file path.

New tests: --config-file supplies the key, --api-key beats it, missing
file exits 4, and a subprocess test proving the env var is inert.
Read-only commands (list/show/logs/metrics/usage/billing/chain/kv
reads/whoami) now retry transient failures: HTTP 429/500/502/503/504
plus transport timeouts and connection errors. Backoff is full-jitter
exponential (base 500ms, capped at 8s per sleep), defaulting to 3
retries; the global --retries <N> flag tunes it and 0 disables.

Mutating commands never retry automatically: a retried create could
provision (and bill) twice. stream test-filter is the one POST that
retries — it evaluates a filter against historical data and changes
nothing.

Adds src/retry.rs (5 unit tests on paused tokio time) and
tests/retry.rs (5 wiremock tests, including proof that POST create
hits the server exactly once on a 500). New dep: fastrand (tiny,
zero-dependency RNG for jitter).
qn stream delete-all and qn webhook delete-all are gone. A one-line
account-wide wipe is a disproportionate blast radius for a CLI command;
that operation belongs behind the API where callers script it
deliberately.

Newly confirmation-gated (Mild: --yes skips, TTY prompts, scripts get
exit 5 before any request is sent):
- endpoint bulk pause — prompt includes the endpoint count
- endpoint security token delete — notes that clients lose access
- endpoint rate-limit delete-override — notes the revert to plan limits

bulk resume stays ungated (it restores service). Existing prompts now
state the blast radius (archive is irreversible from the CLI, tag
delete is account-wide). The kv-local confirm_mild helper moved to
confirm::confirm_mild for reuse. Severity::Severe and prompt_typed
remain for future wide-blast-radius gates.

CLAUDE.md now records the severity policy; README documents the
confirmation model. Tests: each gated command verifies exit 5 with
zero requests reaching the mock without --yes, and exit 0 with it.
- Required flags are now clap-enforced and shown in usage lines:
  endpoint create (--chain/--network), endpoint update (--label), and
  the six stream create flags via required_unless_present(config_file).
  Missing-flag errors now list everything missing at once instead of
  failing one runtime check at a time.
- clap usage errors (typo'd subcommand, missing required flag) exit 1,
  matching runtime argument errors, so exit 2 always and only means
  "the API returned an error". --help/--version still exit 0.
- Global flags are grouped under a "Global options" heading in every
  subcommand's --help, so command-specific flags surface first.
- after_help examples on the top level and on endpoint/stream/webhook/
  kv/auth.
- README exit-code table updated.
- qn stream create's JSON-params flag is now --stream-config-file, so it
  can no longer shadow the global --config-file (CLI config TOML) when
  placed after the subcommand. Help text on both flags cross-references
  the distinction.
- Retry docs (README + --retries help) now state the exact retried
  statuses (429/500/502/503/504) instead of claiming all 5xx, and the
  README notes that stream test-filter retries despite being a POST.
- README environment section no longer claims NO_COLOR/TERM are the
  only honored variables (XDG_CONFIG_HOME/HOME locate the config file).
- Trimmed an SDK-internals aside from the retry.rs module doc.
@johnpmitsch johnpmitsch merged commit 816125b into main Jun 10, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants