feat: add ci-coach, workflow-health-manager, and issue-arborist agentic workflows#1016
Open
IEvangelist wants to merge 3 commits into
Open
feat: add ci-coach, workflow-health-manager, and issue-arborist agentic workflows#1016IEvangelist wants to merge 3 commits into
IEvangelist wants to merge 3 commits into
Conversation
Adds two new gh-aw workflows (compiled with v0.72.0) that monitor CI: - ci-coach.md: Daily CI optimization coach that analyzes recent runs of ci.yml, frontend-build.yml, apphost-build.yml, tools-tests.yml, and update-release-branch.yml. Pre-downloads run history into /tmp/gh-aw/ci-data/, identifies measurable optimization opportunities, validates proposed YAML changes locally with pnpm lint/test/build and dotnet build, and opens a PR (title prefix '[ci-coach]') only when expected savings exceed 5%. Otherwise calls noop. Adapted from github/gh-aw's ci-coach with make commands replaced by aspire.dev's pnpm + dotnet gates and shared imports inlined. - workflow-health-manager.md: Daily meta-orchestrator that runs 'gh aw compile --validate', enumerates every workflow run via the GitHub API, computes a per-workflow reliability score, files workflow-health issues for critical/warning workflows, and maintains a pinned Workflow Health Dashboard issue. Read-only by design -- observations flow through safe-outputs only. Adapted from github/gh-aw's workflow-health-manager with the repo-memory branch and pre-loaded metrics dependencies replaced by direct gh api calls. Also regenerates update-integration-data.lock.yml to clear the stale frontmatter-hash warning introduced by #1010 (raise max-patch-size cap): the .md source had hash 7cca061dc369... but the committed .lock.yml still stored 93ae43c9cb7b... Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds two new gh-aw agentic workflows to monitor CI and overall workflow health in aspire.dev, plus regenerates an existing lock file to sync its frontmatter hash.
Changes:
- Introduce
ci-coachworkflow to analyze recent CI runs, validate proposed optimizations locally, and open PRs only when measurable savings are found. - Introduce
workflow-health-managerworkflow to compile/validate agentic workflows, aggregate workflow reliability metrics, and open/updateworkflow-healthissues/dashboard. - Re-emit
update-integration-data.lock.ymlto align the embedded frontmatter hash with the source workflow.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| .github/workflows/ci-coach.md | Adds the CI optimization coach agentic workflow source and pre-agent data collection steps. |
| .github/workflows/ci-coach.lock.yml | Compiled GitHub Actions workflow for ci-coach (safe-outputs enforcement, scheduled execution). |
| .github/workflows/workflow-health-manager.md | Adds the workflow health meta-orchestrator agentic workflow source and precomputed health data pipeline. |
| .github/workflows/workflow-health-manager.lock.yml | Compiled GitHub Actions workflow for workflow-health-manager (issue/comment safe-outputs, scheduled execution). |
| .github/workflows/update-integration-data.lock.yml | Regenerated lock file with updated frontmatter hash (no intended behavior change). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Adds the issue-arborist workflow (adapted from github/gh-aw v0.72.0) that runs daily, analyzes the last 100 open issues without a parent, and: - Links clearly-related issues as parent -> sub-issue pairs - Creates a new [Parent] issue for clusters of 5+ thematically related orphans - Posts a daily report summarizing analysis, links created, and potential relationships flagged for manual review Adapted from the upstream version (which the add-wizard rejected because of unresolved imports of shared/*.md and skills/jqschema): - engine: codex -> copilot (matches existing repo workflows; no Codex secret configured here) - Dropped imports: shared/github-guard-policy.md, ../skills/jqschema/SKILL.md, shared/reporting.md, shared/otlp.md (none exist in this repo) -- inlined necessary guidance directly - Dropped experiments.prompt_style block (upstream-only A/B infra) - bash: <list> -> bash: true (we're sandboxed by AWF) - safe-outputs switched to aspire-bot GitHub App auth, with the github-app block matching update-integration-data - Dropped create-discussion safe output (uncertain category support); the report is posted as the 6th create-issue slot instead - Pre-agent step writes a lightweight /tmp/gh-aw/issues-data/summary.json (label histogram + milestones + oldest/newest) so the agent reads the summary first instead of pulling the full 100-issue blob into context every run Read-only by design -- no edit: tool, all writes via safe-outputs. Caps: 5 new parent issues, 50 sub-issue links, 1 daily report per run. Scheduled at 04:51 UTC daily (deterministically scattered against microsoft/aspire.dev). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses the five Copilot review comments on PR #1016 and routes every token used by the three new agentic workflows through the existing aspire-bot GitHub App. Review fixes: - ci-coach: replace the non-standard `cron: "daily around 13:00 on weekdays"` source-of-truth with the explicit standard expression `14 13 * * 1-5` that the compiler was already emitting. - workflow-health-manager: rework the `workflows.json` collection step so the output shape is *always* a JSON array. `gh api --paginate --slurp --jq '[.[].workflows[]? | {...}]'` produces a single array across all pages and the iteration uses `jq -c '.[]?'`, which is a no-op when the file is `[]`. A defensive `[ -z "$id" ] && continue` guard makes a single bad record incapable of triggering a `/workflows/null/runs` request. `--paginate` removes the silent 100-workflow cap. - workflow-health-manager: drop the "pinned" requirement from the dashboard section. The workflow has no pin/unpin safe-output and could never honour it; the docstring now states explicitly that pinning is a manual maintainer operation and the workflow only maintains the dashboard body. - workflow-health-manager: remove the bare `edit:` (null) field since this workflow is intentionally read-only and never edits files. - ci-coach: keep `edit:` (null) — the field rejects boolean values per the schema, and ci-coach legitimately needs the edit tool to propose CI tweaks (scoped by `safe-outputs.create-pull-request.allowed-files`). An inline comment explains the choice. All-auth-uses-the-app: - Add `checkout: github-app:` to every workflow so the implicit `actions/checkout` step mints and uses an Aspire-bot installation token (`client-id: ${{ secrets.ASPIRE_BOT_APP_ID }}`) instead of the default `GITHUB_TOKEN`. - Prepend a `Mint Aspire bot token` step (`actions/create-github-app-token@v3`, scoped with explicit `permission-*` inputs) to every workflow's pre-agent `steps:` block. - Replace every `GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}` / `GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}` in pre-agent steps with `${{ steps.app-token.outputs.token }}`. Safe-outputs were already using the App, so the agent + safe-outputs + checkout + pre-agent steps all flow through the same GitHub App identity now. Lock files regenerated; 0 errors. No new secrets introduced — every auth bit reuses the existing `ASPIRE_BOT_APP_ID` / `ASPIRE_BOT_PRIVATE_KEY` secrets already used by `update-integration-data.lock.yml`. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds three new gh-aw agentic workflows that monitor CI and issue health, all adapted from the upstream
github/gh-awrepo and re-scoped for aspire.dev. Every token used by the new workflows — checkout, pre-agent steps, and safe-outputs — flows through the existing aspire-bot GitHub App; no new secrets are introduced.ci-coach.md— daily CI optimization coachcron: "14 13 * * 1-5") +workflow_dispatchci.yml,frontend-build.yml,apphost-build.yml,tools-tests.yml,update-release-branch.ymlonly — every other file is blocked viasafe-outputs.create-pull-request.allowed-files./tmp/gh-aw/ci-data/and aggregates asummary.jsonthe agent reads first.pnpm lint,pnpm test:unit,pnpm build:production, anddotnet build src/apphost/Aspire.Dev.AppHostlocally; only on a fully-green gate does it open a PR.[ci-coach] …via the aspire-bot GitHub App, ornoopwith evidence if no measurable ≥ 5 % improvement is found. The bar for opening a PR is deliberately high.workflow-health-manager.md— daily workflow health meta-orchestratorgh aw compile --validateoutput, the full workflow list (via paginatedgh api), and a per-workflowhealth-summary.jsonwith success rates pulled directly from the GitHub API.edit:tool — every observation flows throughsafe-outputs.create-issue,add-comment, orupdate-issue(capped at 10/15/5 respectively).workflow-health-labelled maintenance issues for any critical/warning workflow, plus a single Workflow Health Dashboard issue updated each run (pinning is a manual maintainer step — the workflow doesn't manage pin state).issue-arborist.md— daily issue gardenersummary.jsonwith label histogram, milestone breakdown, and oldest/newest timestamps.edit:tool — all writes go throughsafe-outputs.[Parent]issue when 5+ orphan issues share a theme (≤ 5 new parents/run), and posts a single daily report issue summarizing the run.Out-of-scope changes (intentional)
update-integration-data.lock.ymlis also re-emitted in this PR. Its.mdsource frontmatter hash (7cca061dc369…) didn't match the committed lock-file hash (93ae43c9cb7b…) after chore: raise max-patch-size cap for Integration Data Updater #1010 raised themax-patch-sizecap without re-runninggh aw compile.gh aw compileregenerates it cleanly here — no behavior change, just a frontmatter-hash sync.Security review
All auth uses the aspire-bot GitHub App
Every token used by the three new workflows is minted from
ASPIRE_BOT_APP_ID+ASPIRE_BOT_PRIVATE_KEY(the same App that already authenticatesupdate-integration-data). No new secrets, no PATs.actions/checkoutof the source repocheckout: github-app:frontmattersteps:(gh api,gh run list,gh aw compile)actions/create-github-app-token@v3step minting${{ steps.app-token.outputs.token }}with scopedpermission-*inputscreate-issue,add-comment,update-issue,create-pull-request)safe-outputs.github-app:frontmatter (unchanged from the previous revision)Permission model — read-only at the agent layer
ci-coachcontents: read,actions: read,pull-requests: read,issues: read. The agent hasedit:so it can stage CI tweaks, butsafe-outputs.create-pull-request.allowed-filesrestricts which paths can actually land in a PR.contents: write,pull-requests: write,issues: write(granted only to the safe-outputs job that applies the PR / fallback issue).workflow-health-managercontents: read,issues: read,pull-requests: read,actions: read. Noedit:tool — read-only on disk.issues: writeforcreate-issue/add-comment/update-issue. The compiled job also hasdiscussions: writebecause that's the default safe-outputs scope; no discussions are configured here.issue-arboristcontents: read,issues: read. Noedit:tool — read-only on disk.issues: writeforcreate-issueandlink-sub-issue.In short: the agent never has GitHub write permissions; only the safe-outputs job that processes the agent's structured output does, and it uses the aspire-bot App identity to do so.
Other security properties
update-integration-dataworkflow (actions/create-github-app-token@v3andactions/checkout@v6were already listed in.github/aw/actions-lock.json).ci-coach:defaults,containers,node,dotnet,githubworkflow-health-manager:defaults,githubissue-arborist:defaults,githubAdaptations from upstream
The upstream
github/gh-awversions of these workflows are tightly coupled to that repo:ci-coachtarget CI workflowsci.yml,cgo.yml,cjs.ymlci.yml,frontend-build.yml,apphost-build.yml,tools-tests.yml,update-release-branch.ymlci-coachvalidation commandsmake lint && make build && make test-unit && make recompilepnpm lint/test:unit/build:production+dotnet build src/apphost/Aspire.Dev.AppHostimports: shared/ci-data-analysis.md,shared/github-guard-policy.md,../skills/jqschema/SKILL.md, etc.steps:blocks producing the same JSON shape under/tmp/gh-awworkflow-health-managermetricsrepo-memorybranch +metrics/latest.jsoncollector workflowgh api repos/.../actions/workflowscalls in pre-agent stepsissue-arboristenginecodexcopilot(no Codex secret here; matchesupdate-integration-data)issue-arboristreportingcreate-discussion(categoryaudits)create-issueslot (no assumption about Discussions config)tracker-id,experiments,features.copilot-requestssecrets.GITHUB_TOKENactions/create-github-app-token@v3Validation
gh aw compile→ 4 workflows compile cleanly with 0 errorsgh aw compile --validate→ all workflows validatemicrosoft/aspire.dev:ci-coach: weekdays 13:14 UTCworkflow-health-manager: daily 09:53 UTCissue-arborist: daily 04:51 UTCupdate-integration-data: unchanged at weekdays 14:54 UTCNotes
upstream/main(HEAD7708bc01).fix(workflows): address PR feedback and use github-app for all auth.