Skip to content

feat: add ci-coach, workflow-health-manager, and issue-arborist agentic workflows#1016

Open
IEvangelist wants to merge 3 commits into
mainfrom
dapine/ci-monitor-workflows
Open

feat: add ci-coach, workflow-health-manager, and issue-arborist agentic workflows#1016
IEvangelist wants to merge 3 commits into
mainfrom
dapine/ci-monitor-workflows

Conversation

@IEvangelist
Copy link
Copy Markdown
Member

@IEvangelist IEvangelist commented May 19, 2026

Summary

Adds three new gh-aw agentic workflows that monitor CI and issue health, all adapted from the upstream github/gh-aw repo and re-scoped for aspire.dev. Every token used by the new workflows — checkout, pre-agent steps, and safe-outputs — flows through the existing aspire-bot GitHub App; no new secrets are introduced.

ci-coach.md — daily CI optimization coach

  • Trigger: weekdays at 13:14 UTC (explicit cron: "14 13 * * 1-5") + workflow_dispatch
  • Scope: ci.yml, frontend-build.yml, apphost-build.yml, tools-tests.yml, update-release-branch.yml only — every other file is blocked via safe-outputs.create-pull-request.allowed-files.
  • Data: pre-downloads the last 60 runs of each target workflow into /tmp/gh-aw/ci-data/ and aggregates a summary.json the agent reads first.
  • Validation gate: the agent must run pnpm lint, pnpm test:unit, pnpm build:production, and dotnet build src/apphost/Aspire.Dev.AppHost locally; only on a fully-green gate does it open a PR.
  • Output: PR titled [ci-coach] … via the aspire-bot GitHub App, or noop with evidence if no measurable ≥ 5 % improvement is found. The bar for opening a PR is deliberately high.

workflow-health-manager.md — daily workflow health meta-orchestrator

  • Trigger: daily at 09:53 UTC
  • Pre-computed data: gh aw compile --validate output, the full workflow list (via paginated gh api), and a per-workflow health-summary.json with success rates pulled directly from the GitHub API.
  • Read-only by design. The agent job has no edit: tool — every observation flows through safe-outputs.create-issue, add-comment, or update-issue (capped at 10/15/5 respectively).
  • Output: filed workflow-health-labelled maintenance issues for any critical/warning workflow, plus a single Workflow Health Dashboard issue updated each run (pinning is a manual maintainer step — the workflow doesn't manage pin state).

issue-arborist.md — daily issue gardener

  • Trigger: daily at 04:51 UTC
  • Pre-computed data: last 100 open issues that are not already sub-issues, plus a summary.json with label histogram, milestone breakdown, and oldest/newest timestamps.
  • Read-only on the codebase. No edit: tool — all writes go through safe-outputs.
  • Actions: links clearly-related issue pairs as parent → sub-issue (≤ 50 links/run), creates a [Parent] issue when 5+ orphan issues share a theme (≤ 5 new parents/run), and posts a single daily report issue summarizing the run.

Out-of-scope changes (intentional)

  • update-integration-data.lock.yml is also re-emitted in this PR. Its .md source frontmatter hash (7cca061dc369…) didn't match the committed lock-file hash (93ae43c9cb7b…) after chore: raise max-patch-size cap for Integration Data Updater #1010 raised the max-patch-size cap without re-running gh aw compile. gh aw compile regenerates it cleanly here — no behavior change, just a frontmatter-hash sync.

Security review

All auth uses the aspire-bot GitHub App

Every token used by the three new workflows is minted from ASPIRE_BOT_APP_ID + ASPIRE_BOT_PRIVATE_KEY (the same App that already authenticates update-integration-data). No new secrets, no PATs.

Layer Auth Implementation
actions/checkout of the source repo aspire-bot App checkout: github-app: frontmatter
Pre-agent steps: (gh api, gh run list, gh aw compile) aspire-bot App Explicit actions/create-github-app-token@v3 step minting ${{ steps.app-token.outputs.token }} with scoped permission-* inputs
Safe-outputs (create-issue, add-comment, update-issue, create-pull-request) aspire-bot App safe-outputs.github-app: frontmatter (unchanged from the previous revision)

Permission model — read-only at the agent layer

Workflow Agent job permissions Safe-outputs job permissions (required to apply outputs)
ci-coach contents: read, actions: read, pull-requests: read, issues: read. The agent has edit: so it can stage CI tweaks, but safe-outputs.create-pull-request.allowed-files restricts which paths can actually land in a PR. contents: write, pull-requests: write, issues: write (granted only to the safe-outputs job that applies the PR / fallback issue).
workflow-health-manager contents: read, issues: read, pull-requests: read, actions: read. No edit: tool — read-only on disk. issues: write for create-issue / add-comment / update-issue. The compiled job also has discussions: write because that's the default safe-outputs scope; no discussions are configured here.
issue-arborist contents: read, issues: read. No edit: tool — read-only on disk. issues: write for create-issue and link-sub-issue.

In short: the agent never has GitHub write permissions; only the safe-outputs job that processes the agent's structured output does, and it uses the aspire-bot App identity to do so.

Other security properties

  • No new third-party Actions beyond what gh-aw v0.72.0 already pins for the existing update-integration-data workflow (actions/create-github-app-token@v3 and actions/checkout@v6 were already listed in .github/aw/actions-lock.json).
  • No new container images beyond what gh-aw v0.72.0 already pins.
  • Networks are scoped:
    • ci-coach: defaults, containers, node, dotnet, github
    • workflow-health-manager: defaults, github
    • issue-arborist: defaults, github

Adaptations from upstream

The upstream github/gh-aw versions of these workflows are tightly coupled to that repo:

Concern Upstream This PR
ci-coach target CI workflows ci.yml, cgo.yml, cjs.yml ci.yml, frontend-build.yml, apphost-build.yml, tools-tests.yml, update-release-branch.yml
ci-coach validation commands make lint && make build && make test-unit && make recompile pnpm lint/test:unit/build:production + dotnet build src/apphost/Aspire.Dev.AppHost
Pre-loaded data imports: shared/ci-data-analysis.md, shared/github-guard-policy.md, ../skills/jqschema/SKILL.md, etc. Inline steps: blocks producing the same JSON shape under /tmp/gh-aw
workflow-health-manager metrics repo-memory branch + metrics/latest.json collector workflow Direct gh api repos/.../actions/workflows calls in pre-agent steps
issue-arborist engine codex copilot (no Codex secret here; matches update-integration-data)
issue-arborist reporting create-discussion (category audits) 6th create-issue slot (no assumption about Discussions config)
Experiment harness tracker-id, experiments, features.copilot-requests Dropped (upstream-only A/B infra)
Pre-agent step auth secrets.GITHUB_TOKEN aspire-bot App via actions/create-github-app-token@v3

Validation

  • gh aw compile → 4 workflows compile cleanly with 0 errors
  • gh aw compile --validate → all workflows validate
  • Cron values deterministically scattered against microsoft/aspire.dev:
    • ci-coach: weekdays 13:14 UTC
    • workflow-health-manager: daily 09:53 UTC
    • issue-arborist: daily 04:51 UTC
    • update-integration-data: unchanged at weekdays 14:54 UTC

Notes

  • Branch is rebased onto upstream/main (HEAD 7708bc01).
  • All five Copilot review comments on the previous revision have been addressed inline; see the latest commit fix(workflows): address PR feedback and use github-app for all auth.

Adds two new gh-aw workflows (compiled with v0.72.0) that monitor CI:

- ci-coach.md: Daily CI optimization coach that analyzes recent runs of
  ci.yml, frontend-build.yml, apphost-build.yml, tools-tests.yml, and
  update-release-branch.yml. Pre-downloads run history into
  /tmp/gh-aw/ci-data/, identifies measurable optimization opportunities,
  validates proposed YAML changes locally with pnpm lint/test/build and
  dotnet build, and opens a PR (title prefix '[ci-coach]') only when
  expected savings exceed 5%. Otherwise calls noop. Adapted from
  github/gh-aw's ci-coach with make commands replaced by aspire.dev's
  pnpm + dotnet gates and shared imports inlined.

- workflow-health-manager.md: Daily meta-orchestrator that runs
  'gh aw compile --validate', enumerates every workflow run via the
  GitHub API, computes a per-workflow reliability score, files
  workflow-health issues for critical/warning workflows, and maintains
  a pinned Workflow Health Dashboard issue. Read-only by design --
  observations flow through safe-outputs only. Adapted from
  github/gh-aw's workflow-health-manager with the repo-memory branch
  and pre-loaded metrics dependencies replaced by direct gh api calls.

Also regenerates update-integration-data.lock.yml to clear the stale
frontmatter-hash warning introduced by #1010 (raise max-patch-size cap):
the .md source had hash 7cca061dc369... but the committed .lock.yml
still stored 93ae43c9cb7b...

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 19, 2026 20:29
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds two new gh-aw agentic workflows to monitor CI and overall workflow health in aspire.dev, plus regenerates an existing lock file to sync its frontmatter hash.

Changes:

  • Introduce ci-coach workflow to analyze recent CI runs, validate proposed optimizations locally, and open PRs only when measurable savings are found.
  • Introduce workflow-health-manager workflow to compile/validate agentic workflows, aggregate workflow reliability metrics, and open/update workflow-health issues/dashboard.
  • Re-emit update-integration-data.lock.yml to align the embedded frontmatter hash with the source workflow.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
.github/workflows/ci-coach.md Adds the CI optimization coach agentic workflow source and pre-agent data collection steps.
.github/workflows/ci-coach.lock.yml Compiled GitHub Actions workflow for ci-coach (safe-outputs enforcement, scheduled execution).
.github/workflows/workflow-health-manager.md Adds the workflow health meta-orchestrator agentic workflow source and precomputed health data pipeline.
.github/workflows/workflow-health-manager.lock.yml Compiled GitHub Actions workflow for workflow-health-manager (issue/comment safe-outputs, scheduled execution).
.github/workflows/update-integration-data.lock.yml Regenerated lock file with updated frontmatter hash (no intended behavior change).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/workflow-health-manager.md Outdated
Comment thread .github/workflows/workflow-health-manager.md Outdated
Comment thread .github/workflows/workflow-health-manager.md Outdated
Comment thread .github/workflows/ci-coach.md Outdated
Comment thread .github/workflows/workflow-health-manager.lock.yml
Adds the issue-arborist workflow (adapted from github/gh-aw v0.72.0) that
runs daily, analyzes the last 100 open issues without a parent, and:

- Links clearly-related issues as parent -> sub-issue pairs
- Creates a new [Parent] issue for clusters of 5+ thematically related
  orphans
- Posts a daily report summarizing analysis, links created, and
  potential relationships flagged for manual review

Adapted from the upstream version (which the add-wizard rejected
because of unresolved imports of shared/*.md and skills/jqschema):

- engine: codex -> copilot (matches existing repo workflows; no Codex
  secret configured here)
- Dropped imports: shared/github-guard-policy.md,
  ../skills/jqschema/SKILL.md, shared/reporting.md, shared/otlp.md
  (none exist in this repo) -- inlined necessary guidance directly
- Dropped experiments.prompt_style block (upstream-only A/B infra)
- bash: <list> -> bash: true (we're sandboxed by AWF)
- safe-outputs switched to aspire-bot GitHub App auth, with the
  github-app block matching update-integration-data
- Dropped create-discussion safe output (uncertain category support);
  the report is posted as the 6th create-issue slot instead
- Pre-agent step writes a lightweight /tmp/gh-aw/issues-data/summary.json
  (label histogram + milestones + oldest/newest) so the agent reads
  the summary first instead of pulling the full 100-issue blob into
  context every run

Read-only by design -- no edit: tool, all writes via safe-outputs.
Caps: 5 new parent issues, 50 sub-issue links, 1 daily report per run.
Scheduled at 04:51 UTC daily (deterministically scattered against
microsoft/aspire.dev).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@IEvangelist IEvangelist changed the title feat: add ci-coach and workflow-health-manager agentic workflows feat: add ci-coach, workflow-health-manager, and issue-arborist agentic workflows May 19, 2026
Addresses the five Copilot review comments on PR #1016 and routes every
token used by the three new agentic workflows through the existing
aspire-bot GitHub App.

Review fixes:
- ci-coach: replace the non-standard `cron: "daily around 13:00 on
  weekdays"` source-of-truth with the explicit standard expression
  `14 13 * * 1-5` that the compiler was already emitting.
- workflow-health-manager: rework the `workflows.json` collection step
  so the output shape is *always* a JSON array. `gh api --paginate
  --slurp --jq '[.[].workflows[]? | {...}]'` produces a single array
  across all pages and the iteration uses `jq -c '.[]?'`, which is a
  no-op when the file is `[]`. A defensive `[ -z "$id" ] && continue`
  guard makes a single bad record incapable of triggering a
  `/workflows/null/runs` request. `--paginate` removes the silent
  100-workflow cap.
- workflow-health-manager: drop the "pinned" requirement from the
  dashboard section. The workflow has no pin/unpin safe-output and
  could never honour it; the docstring now states explicitly that
  pinning is a manual maintainer operation and the workflow only
  maintains the dashboard body.
- workflow-health-manager: remove the bare `edit:` (null) field since
  this workflow is intentionally read-only and never edits files.
- ci-coach: keep `edit:` (null) — the field rejects boolean values per
  the schema, and ci-coach legitimately needs the edit tool to propose
  CI tweaks (scoped by `safe-outputs.create-pull-request.allowed-files`).
  An inline comment explains the choice.

All-auth-uses-the-app:
- Add `checkout: github-app:` to every workflow so the implicit
  `actions/checkout` step mints and uses an Aspire-bot installation
  token (`client-id: ${{ secrets.ASPIRE_BOT_APP_ID }}`) instead of the
  default `GITHUB_TOKEN`.
- Prepend a `Mint Aspire bot token` step (`actions/create-github-app-token@v3`,
  scoped with explicit `permission-*` inputs) to every workflow's
  pre-agent `steps:` block.
- Replace every `GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}` / `GITHUB_TOKEN:
  ${{ secrets.GITHUB_TOKEN }}` in pre-agent steps with
  `${{ steps.app-token.outputs.token }}`. Safe-outputs were already
  using the App, so the agent + safe-outputs + checkout + pre-agent
  steps all flow through the same GitHub App identity now.

Lock files regenerated; 0 errors. No new secrets introduced — every
auth bit reuses the existing `ASPIRE_BOT_APP_ID` / `ASPIRE_BOT_PRIVATE_KEY`
secrets already used by `update-integration-data.lock.yml`.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@IEvangelist IEvangelist enabled auto-merge (squash) May 19, 2026 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants