Cutout skills from flows by sveto · Pull Request #90 · griddynamics/rosetta

sveto · 2026-05-22T08:21:28Z

QA, AQA, TestGen workflows transferred from CTO. "QA" is the new name of "API-QA".

10 skills transferred with them. They have passed the check on independence and reusability. The 4 least reusable skills are now provided with an instruction about how to reuse them (if there is need to reuse them once -- clone and edit; only if there is need to reuse them twice or more -- create a more general skill).

6 skills are extracted by me from QA, AQA, TestGen. They also have passed the check on independence and reusability.

Manual real-life test of the three workflows (with subsequent bug fix) is also performed. plugin_generator regenerates all 6 plugin trees cleanly.

R3 changes were also propagated to R2 (please inform me if this is wrong).

99% of GitHub's suggestions are implemented. The remaining ones seem to be either out-of-scope for this PR -- or noise/fluctuation.

Some follow-up items are documented in docs/TODO.md.

Artifacts carried via rebase from v3 — not authored on this branch

Changed all plan-manager occurrences to operation-manager (the skill was renamed in v3).
Hooks (generated artifacts only): plugins/**/hooks.json files regenerated by plugin_generator.py to reflect the hooks runtime + templates merged into main via the v3 release line. No changes to the hooks runtime (hooks/) or templates (*.tmpl) on this branch.
GitNexus skills (already merged as PR feat: gitnexus integration #84 by another contributor): gitnexus-cli, gitnexus-setup, gitnexus-tools plus gn-examples asset. No source changes on this branch — plugin-tree copies refreshed by plugin_generator.py post-rebase.
bootstrap-hitl-questioning.md was deleted in the v3 merge by another contributor. HITL enforcement is now in the hitl skill, referenced from bootstrap-guardrails.md. Discoverability verified. Six R2-phrasing gaps are tracked in docs/TODO.md for future review.

Note on r2 vs r3 audit discrepancies

The audit reports more findings in r2 than r3 (~105 r2 files audited vs ~52 r3 files). Two factors drive the delta:

Scope. The r3 audit covers the qa/aqa/testgen surface this PR touched. The r2 audit additionally covers ~50 pre-existing legacy r2 files (gitnexus-*, init-workspace-*, load-context, load-workflow, operation-manager, adhoc-flow, coding-flow, external-lib-flow, modernization-flow, research-flow, self-help-flow, etc.). These weren't touched by this PR or by main's recent commits; their findings are pre-existing and out of scope for this PR.
Release-aware evaluation. ~14 of 15 same-named files are byte-identical between r2 and r3 (verified by diff), yet several evaluate stricter in r2 (e.g. automation-test-implementation-handoff, confluence-source-harvesting, adhoc-flow, qa-flow). The driver appears to be release-side context — different bootstrap rules, different load-context behavior, different surrounding skills — not file content. The single file that does differ between releases is load-context/SKILL.md.

…- done

github-actions · 2026-06-01T15:39:27Z

Rosetta Triage Review

Summary: This PR refactors the Rosetta instruction set by extracting inline skill logic from workflow files into dedicated SKILL.md files (SRP / DRY principle), adds a new QA workflow family (qa-flow.*), introduces utility skills (operation-manager, load-context-instructions, load-workflow, gitnexus tools), and significantly expands the hooks.json automation configuration for Claude Code sessions.

Findings:

Empty PR body — 101 files changed (+51 584 / −39 507 lines) with zero description. This makes review extremely difficult and blocks traceability. A summary of architectural intent, scope, and migration notes is required per contribution standards.
Stale plan-manager references after skill rename — plan-manager/SKILL.md was deleted from plugins/core-claude/ and replaced by operation-manager, but instructions/r3/core/workflows/adhoc-flow.md and plugins/core-claude/workflows/adhoc-flow.md still reference USE SKILL plan-manager``. docs/definitions/skills.md also still lists `plan-manager`. These dangling references will cause agent lookup failures at runtime.
Breaking deletion of bootstrap-hitl-questioning.md — HITL enforcement is now routed through the hitl skill referenced in the updated bootstrap-guardrails.md. While architecturally correct per R3 design, this is a breaking change for any agent or workflow on the plugin path that references the old file by name. No migration note is provided.
No migration guide — The plan-manager → operation-manager rename and the bootstrap-hitl-questioning.md removal each require a migration note for existing Rosetta deployments and downstream prompt authors.
Instructions folder modified — As required by repo-triage.md, the coding-agents-prompt-authoring skill review was applied. The architectural approach (HITL moved to skill, not duplicated inline; skills extracted from flows) is consistent with pa-rosetta.md, pa-hardening.md, pa-patterns, and pa-schemas principles. The plugin-side pa-rosetta.md and pa-hardening.md references have been updated accordingly.

Suggestions:

Add a PR description explaining: (1) skill extraction rationale, (2) plan-manager → operation-manager rename, (3) bootstrap-hitl-questioning.md deprecation strategy, (4) new QA / gitnexus skill additions.
Fix stale plan-manager references in instructions/r3/core/workflows/adhoc-flow.md, plugins/core-claude/workflows/adhoc-flow.md, and docs/definitions/skills.md.
Add a MIGRATION.md or changelog entry covering the breaking renames and deletions.
Verify the hitl skill is discoverable and fully documents the HITL behavior previously in bootstrap-hitl-questioning.md.

Automated triage by Rosetta agent

github-actions · 2026-06-01T15:48:06Z

📋 Prompt Quality Validation Report

❌ Validation Failed

Summary by File

File	🟠 Very High	🟡 High	🔵 Medium	⚪ Low	Status
`instructions/r3/core/workflows/aqa-flow.md`	0	1	0	0	❌ Fail
`instructions/r3/core/workflows/aqa-flow-code-analysis.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-data-collection.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-requirements-clarification.md`	0	1	1	0	❌ Fail
`instructions/r3/core/workflows/aqa-flow-selector-identification.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-selector-implementation.md`	0	0	2	1	⚠️ Warning
`instructions/r3/core/workflows/aqa-flow-test-correction.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-test-implementation.md`	0	1	3	0	❌ Fail
`instructions/r3/core/workflows/aqa-flow-test-report-analysis.md`	0	0	2	0	⚠️ Warning
`instructions/r2/core/workflows/aqa-flow-test-report-analysis.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/api-test-spec-authoring/SKILL.md`	0	0	2	1	⚠️ Warning
`instructions/r3/core/skills/aqa-codebase-analysis/SKILL.md`	0	1	3	0	❌ Fail
`instructions/r3/core/skills/aqa-requirements-elicitation/SKILL.md`	2	5	2	0	❌ Fail
`instructions/r3/core/skills/aqa-selector-management/SKILL.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/skills/aqa-test-authoring/SKILL.md`	0	0	2	1	⚠️ Warning
`instructions/r3/core/skills/aqa-test-debugging/SKILL.md`	0	0	1	1	⚠️ Warning
`instructions/r3/core/skills/automation-test-execution-analysis/SKILL.md`	0	0	2	0	⚠️ Warning
`instructions/r3/core/skills/automation-test-implementation-handoff/SKILL.md`	1	1	1	0	❌ Fail
`instructions/r3/core/workflows/qa-flow.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-api-spec-analysis.md`	0	0	2	0	⚠️ Warning
`instructions/r3/core/workflows/qa-flow-data-collection.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-documentation-mcp-subflow.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/qa-flow-execution-and-report-analysis.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-gap-and-requirements-clarification.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-project-config-loading.md`	0	2	3	0	❌ Fail
`instructions/r3/core/workflows/qa-flow-test-case-specification.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-test-correction.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-test-implementation.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/confluence-source-harvesting/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/gap-and-contradiction-analysis/SKILL.md`	0	0	2	0	⚠️ Warning
`instructions/r3/core/skills/mcp-confluence-data-collection/SKILL.md`	0	0	2	0	⚠️ Warning
`instructions/r3/core/skills/mcp-jira-data-collection/SKILL.md`	0	0	2	0	⚠️ Warning
`instructions/r3/core/skills/mcp-testrail-data-collection/SKILL.md`	0	1	3	0	❌ Fail
`instructions/r3/core/skills/qa-data-collection/SKILL.md`	0	1	2	0	❌ Fail
`instructions/r3/core/skills/qa-gap-analysis/SKILL.md`	0	0	2	1	⚠️ Warning
`instructions/r3/core/skills/qa-project-config/SKILL.md`	0	1	1	0	❌ Fail
`instructions/r3/core/skills/qa-test-debugging/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/qa-test-implementation/SKILL.md`	0	1	2	0	❌ Fail
`instructions/r3/core/skills/repository-implementation-standards/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/requirements-synthesis/SKILL.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/skills/sequential-workflow-execution/SKILL.md`	0	0	2	1	⚠️ Warning
`instructions/r3/core/skills/swagger-contracts-analysis/SKILL.md`	0	2	3	0	❌ Fail
`instructions/r3/core/skills/testrail-test-case-authoring/SKILL.md`	0	0	2	0	⚠️ Warning
`instructions/r3/core/skills/testrail-test-case-export/SKILL.md`	0	1	0	0	❌ Fail
`instructions/r3/core/skills/user-approved-code-changes/SKILL.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/testgen-flow.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-data-collection.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/testgen-flow-gap-and-contradiction-analysis.md`	0	1	1	0	❌ Fail
`instructions/r3/core/workflows/testgen-flow-project-config-loading.md`	0	0	0	1	⚠️ Warning
`instructions/r3/core/workflows/testgen-flow-question-generation.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-requirements-document-generation.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-test-case-export.md`	1	1	0	0	❌ Fail
`instructions/r3/core/workflows/testgen-flow-test-case-generation.md`	0	0	0	1	⚠️ Warning

📄 `instructions/r3/core/workflows/aqa-flow.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Reference Integrity	Problem: The `description` frontmatter field was deleted. BASE had `description: MUST apply when automated QA/testing task is assigned...`; NEW frontmatter has only name/tags/baseSchema. `docs/schemas/workflow.md` defines `description` as a required field that states WHEN/HOW to use the workflow and is the routing trigger used to select the workflow. Reason: Without the schema-required description, workflow selection/routing can fail to match this flow when a QA task arrives, and the file violates its declared baseSchema contract. Solution: Restore a `description:` line in the frontmatter (the WHEN/HOW routing trigger, e.g. the original 'MUST apply when automated QA/testing task is assigned...'), per `docs/schemas/workflow.md`.

📊 Gates Comparison

Gate	Score	Comparison
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	3	⬇️ Slightly worse
Structural Coherence	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-code-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-data-collection.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-requirements-clarification.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Workflow Completeness	Problem: BASE Phase 2 had a full 'Task 2: Define Explicit Assertions' (assertion types, 'Document all assertions in test plan', a 'Defined Assertions' template block, and an 'Assertions Defined: [Count]' state line). NEW deletes all of it and delegates to USE SKILL aqa-requirements-elicitation, but that skill's body only lists unknowns and produces no assertions. The assertion-definition responsibility is dropped by both the phase and the delegated skill, not relocated. Reason: Explicit per-step assertions are consumed downstream by Phase 6 (test implementation) and Phase 8 (correction). Phase 6 even validates 'All assertions from Phase 2 implemented'. Losing them means tests are authored against an assertion contract that no step writes. Solution: Restore an assertion-definition step plus a 'Defined Assertions' block to this phase's update_test_plan/validation_checklist (mirroring BASE), OR genuinely move it by adding explicit assertion-definition steps to aqa-requirements-elicitation. Do not keep the title 'Assertion Definition' while neither artifact produces assertions.
🔵 Medium	Output Contract	Problem: The phase frontmatter description still reads 'Requirements Clarification and Assertion Definition', but the assertion-definition output was removed. BASE Task 2 'Define Explicit Assertions' and the test-plan template's '### Defined Assertions' (per-step assert + verification) are gone from NEW; the NEW `<update_test_plan>` template captures only Questions/Responses/Edge Cases/Test Data. Reason: The phase title still advertises 'Assertion Definition', but neither this phase template nor the delegated aqa-requirements-elicitation skill produces assertions, so the stated output no longer matches what is created. Solution: Either drop 'Assertion Definition' from the description to match the slimmer scope, or note that assertion definition is delegated to `aqa-requirements-elicitation` so the phase's stated output and its template stay consistent.

📊 Gates Comparison

Gate	Score	Comparison
Single Responsibility	5	⬆️ Slightly better
Output Contract	3	⬇️ Slightly worse
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	3	⬇️ Slightly worse
Structural Coherence	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-selector-identification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-selector-implementation.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Example Grounding	Problem: The base file grounded the abstract instruction 'follow the exact existing pattern' with concrete TypeScript examples (private readonly selector style, getter/action methods, new page-object skeleton). The new file deletes all code examples; the only example left is naming (`getSubmitButton()` vs `submitBtn()`) inside `<skill_precedence>`, which illustrates the conflict rule but not how to add a selector or method. Reason: An agent unfamiliar with the page-object convention now has no concrete anchor in this phase file and must infer the shape, which can produce inconsistent implementations. Solution: Keep the terse body but add one small grounding example (a 3-4 line before/after of adding a selector + accessor in the project pattern), or state that the concrete pattern is owned by the `aqa-selector-management` skill so the engineer knows where the example lives.
🔵 Medium	Output Contract	Problem: The base file defined a concrete output template for the Phase 5 test-plan section (Page Objects Modified/Created with selector names, types, purposes, and methods). The new file removes that template entirely; the only remaining output spec is the terse `agents/aqa-state.md` bullet list in step 5.3 (counts and paths). The engineer no longer has a deterministic shape for the per-selector implementation record. Reason: Without a defined output shape the downstream Phase 7/8 consumers and the state file get inconsistent detail, reducing traceability across the chain. Solution: Add a short output-shape block (or a one-line reference) for the Phase 5 record listing the required fields per page object (path, selectors added with type/purpose, helper methods), similar to the compact `<correction_output_shapes>` block used in aqa-flow-test-correction.md.
⚪ Low	Epistemic Honesty	Problem: The new file adds strong failure handling for zero-document ACQUIRE (`<skill_acquire_failure>`) but does not ask the engineer to surface uncertainty when an existing page-object convention is ambiguous (base had this implicit via 'understand its structure and patterns' detail). Nothing tells the agent to flag low-confidence pattern matches. Reason: Selector/page-object style guesses made silently can diverge from project conventions and only surface much later at test execution. Solution: Add one line to step 5.1 or the checklist: if the existing page-object convention is unclear or conflicting, record the uncertainty in `agents/aqa-state.md` and proceed with the closest match rather than guessing silently.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	3	⬇️ Slightly worse
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	3	⬇️ Slightly worse
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-test-correction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-test-implementation.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Failure Handling	Problem: Step 6.1 forbids ACQUIRing coding/testing/aqa-test-authoring directly and delegates to the handoff. Sub-step 3 handles a narrative-only (orchestration-less) handoff doc only by recording a warning and asking the user. There is no defined route to actually produce the test when the handoff doc is thin AND the user cannot resolve it. The file has no <failure_handling> block at all, unlike the sibling qa-flow-test-implementation.md. Reason: A thin or stale handoff doc combined with the strict no-direct-skill policy and no failure_handling leaves the agent with no sanctioned way to author the test, so Phase 6 can dead-end. Solution: Add a <failure_handling> block mirroring qa-flow-test-implementation.md: define behavior for zero-doc/thin handoff, lint failures, and partial returns, including an explicit user-approved fallback to the domain authoring skill so the phase has a defined way to finish.
🔵 Medium	Workflow Completeness	Problem: Step 6.1 hard-forbids ACQUIRing `coding`, `testing`, `aqa-test-authoring` directly because 'the handoff delegates internally', but step 3 only handles the case where the handoff doc lacks orchestration sections by recording a warning and asking the user. There is no defined path for actually producing the test when the handoff doc is narrative-only and the user is unavailable, so the phase can stall with no completion route. Reason: A thin or stale handoff KB document combined with the strict no-direct-skill policy leaves the agent with no sanctioned way to author the test, breaking the chain at Phase 6. Solution: Add an explicit fallback for the narrative-only handoff case (mirroring the test-correction phase's debugging/coding fallback chain) gated behind user approval, so the phase has a defined way to finish rather than only a warning-and-wait branch.
🔵 Medium	Example Grounding	Problem: All concrete TypeScript test examples from base (setup, actions, explicit assertions like `expect(welcomeMessage).toContain(...)`, cleanup hooks, TestRail comment) were removed. The new file has no code-level anchor for what a 'good' test looks like; it relies entirely on the handoff skill the agent may not have inspected. Reason: Without a grounded example and with authoring fully delegated, an agent whose handoff skill is thin (the file explicitly anticipates narrative-only handoff docs) has no fallback pattern to produce a correct test. Solution: Either retain one minimal end-to-end test example, or add a sentence pointing the engineer to the handoff/authoring skill as the source of the concrete test pattern so the abstraction 'create automated test' is grounded somewhere reachable.
🔵 Medium	Output Contract	Problem: The new file delegates all authoring to the `automation-test-implementation-handoff` skill and keeps only a terse state-file output (step 6.4). The base file specified the concrete test-file structure expectations (imports order, describe blocks, explicit assertions, no hardcoded waits, TestRail reference) and a Phase 6 test-plan record template. With those deleted, the deliverable test file has no shape contract in this phase file beyond 'lint-clean' and 'assertions implemented'. Reason: The validate step 6.2 checks 'assertions implemented' and 'page objects used' but the harder constraints (no hardcoded waits, assertion explicitness) present in base are gone, so a weaker test can pass this phase. Solution: Add a short list of the non-negotiable acceptance properties for the produced test file (uses page objects only, all Phase 2 assertions present, no hardcoded sleeps, project import order) as a contract the handoff output must satisfy, or explicitly state these belong to the handoff skill so the verifier knows where they live.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Output Contract	3	⬇️ Slightly worse
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	3	⬇️ Slightly worse
Safety Boundaries	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/aqa-flow-test-report-analysis.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Example Grounding	Problem: Base grounded selector-error detection with a concrete list of error-message patterns ('selector did not become visible', 'NoSuchElementException', 'TimeoutException', etc.) that trigger mandatory page-source analysis. The new file keeps only 'Verify page source analyzed for selector errors' (step 7.2.3) with no example patterns, so the trigger for the page-source analysis is now vague. Reason: Without the trigger patterns an agent may skip page-source analysis on a selector failure, the exact case the base file forced, weakening root-cause accuracy. Solution: Restore a short example list of the selector/locator error signatures that must trigger page-source analysis (even 3-4 representative patterns), so the agent reliably recognizes when this mandatory step applies.
🔵 Medium	Output Contract	Problem: The base file gave a concrete per-failure analysis record template (Error Type, Error Message, Stack Trace, Likely Cause, Evidence Label/Rationale, full Page Source Analysis block, Suggested Fix, Priority) plus a Phase 7 test-plan section schema. The new file removes both and only specifies the `agents/aqa-state.md` bullets in step 7.3. The detailed failure record that Phase 8 consumes now has no shape in this phase file. Reason: Phase 8 (test-correction) consumes 'failure analysis from Phase 7'; without a defined record shape the handoff between phases can lose the page-source and evidence detail Phase 8 needs. Solution: Add a compact failure-record shape (fields: test name, error type, evidence label + rationale, page-source finding, recommended fix, priority) so the analysis output that Phase 8 corrections rely on stays deterministic, or reference that shape as owned by the failure-analysis skill.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	3	⬇️ Slightly worse
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	3	⬇️ Slightly worse
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow-test-report-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/api-test-spec-authoring/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Failure Handling	Problem: Prerequisites require 'API endpoint contracts available' and 'Gap analysis and user clarifications completed', but the process gives no branch for when these inputs are missing or incomplete (step 1 just reads them). Reason: The pitfalls warn against placeholder values; without an explicit missing-input branch the agent will fabricate contract details, producing wrong specs. Solution: Add a guard at step 1: if endpoint contracts or clarifications are missing, stop and report the missing input to the caller rather than inventing request/response shapes.
🔵 Medium	Success Criteria	Problem: The skill lists a 6-step process and pitfalls but has no explicit done-when block. There is no testable completion condition (e.g. every test case mapped to >=1 scenario, every scenario has exact values, file mapping covers all scenarios). Reason: Without a done-when the agent cannot self-check coverage and may emit partial specs that look complete. Solution: Add a short success-criteria block stating measurable completion: each test case yields >=1 scenario, every scenario has exact request/response values and explicit assertions, all scenarios appear in the file mapping table, and shared utilities reference their scenario IDs.
⚪ Low	Epistemic Honesty	Problem: The skill never tells the author to flag scenarios where the contract is ambiguous or assumed; it pushes for 'exact test values' everywhere with no way to mark a value as inferred. Reason: Forcing exact values without an assumption marker hides guesses behind confident-looking specs. Solution: Add one line: when an exact value or status code is inferred rather than sourced from the contract, mark it as ASSUMED so the caller can verify.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/aqa-codebase-analysis/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Output Contract	Problem: Step 7 'Update Test Plan' lists the fields to add (framework, frontend analysis, page objects, similar tests, recommended location, utilities) but gives no output format or example, unlike the sibling skills aqa-selector-management and api-test-spec-authoring which both include a concrete markdown template. Reason: A field list without a format yields inconsistent sections, so the selector and implementation phases that consume this analysis cannot reliably find page-object and utility findings. Solution: Add an output_format block showing the markdown structure of the 'code analysis section' (headings and bullet shape) so the section is written consistently and downstream phases can parse it.
🔵 Medium	Self-Validation	Problem: The skill has no verification step confirming the analysis findings before handing off (e.g. that referenced page objects and utilities actually exist at the paths recorded). Reason: Search-based findings can include stale or guessed paths; without a re-check the downstream implementation phase acts on unverified references. Solution: Add a final check that each reported page object, utility, and similar-test path was actually found in the codebase, not assumed.
🔵 Medium	Failure Handling	Problem: Step 1 reads 'agents/user-app/project_description.md' as a hard prerequisite but gives no branch if that file is absent or lacks framework/structure info; only step 2 (user-instructions) has a skip-if-missing clause. Reason: The pitfall 'Assuming project structure without verification' is listed but no step prevents it when the description file is missing. Solution: Add a guard at step 1: if project_description.md is missing or lacks framework/structure, derive what is possible from the codebase and report the gap, or stop and ask the caller, instead of assuming structure (which the pitfalls already warn against).
🔵 Medium	Success Criteria	Problem: No done-when block. The process ends at step 7 with no testable check that the analysis is complete (framework identified, page objects classified existing/missing/to-extend, test location decided with rationale). Reason: Without completion criteria the agent may skip steps (e.g. utility search) and still consider the analysis done. Solution: Add success criteria: framework and standards captured, every relevant page object classified existing/extend/create, test location chosen with rationale, reusable utilities listed.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/aqa-requirements-elicitation/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟠 Very High	Goal Specification	Problem: The frontmatter description claims three jobs (analyze gaps, define explicit measurable assertions, prepare structured questions), but the body only does gap analysis. when_to_use is a single fragment 'Define gaps in test case understanding' and the assertion-definition and question-preparation goals are absent from the process. Reason: Description-body mismatch makes the workflow load this skill expecting assertions and questions it never produces. Solution: Reconcile description and body: either add process steps for defining measurable assertions and preparing structured questions, or narrow the description to gap analysis only so the stated goal matches the actual capability.
🟠 Very High	Output Contract	Problem: The skill produces no defined output. The description promises 'define explicit measurable assertions' and 'prepare structured questions for user', but the only output the process names is step 3 'Preprate list unknowns and ambiguities' with no format, schema, or example, and no assertion artifact at all. Reason: A skill consumed by an AQA workflow with no output contract cannot reliably hand structured gaps/questions to the next phase, breaking the chain. Solution: Add an output_format block defining the deliverable: a structured list of gaps/unknowns and a set of clarifying questions (and, per the description, the measurable assertions), with a short markdown template and where it is written (the test plan file).
🟡 High	Success Criteria	Problem: No done-when condition. There is no testable statement of when elicitation is complete (e.g. every ambiguous step has a question, every expected result is measurable). Reason: Without completion criteria the agent stops arbitrarily, leaving gaps unaddressed. Solution: Add success criteria: every vague step has a clarifying question and every expected result is stated as a measurable assertion.
🟡 High	Decision Branching	Problem: There is no branch for the outcome of the completeness analysis: what to do when no gaps are found vs many gaps, or when the test plan file is missing. Reason: Without branches the agent does not know how to terminate when the plan is already complete or the input is missing. Solution: Add explicit branches: if no gaps, record 'no clarifications needed' and proceed; if gaps exist, produce the question list; if the test plan file is absent, stop and report to the caller.
🟡 High	Example Grounding	Problem: The skill gives no example of a gap, an assertion, or a question, unlike the sibling AQA skills which include concrete templates. Reason: Without examples the abstract checklist (clear steps, measurable results, edge cases) is interpreted inconsistently. Solution: Add one concrete example of a measurable assertion and one clarifying question derived from a vague test step.
🟡 High	Precision & Explicitness	Problem: Step 3 contains a broken instruction 'Preprate list unknowns and ambiguities' (typo, missing word). It is the single action verb of the skill and is malformed. Reason: The skill's core action is unreadable, so the agent may mis-execute or skip the only output-producing step. Solution: Rewrite as a clear directive, e.g. 'Prepare a list of unknowns and ambiguities, one item per gap, each phrased as a specific clarifying question.'
🟡 High	Workflow Completeness	Problem: The process is three terse steps and stops at 'Preprate list unknowns and ambiguities'. There is no step to turn ambiguities into questions, to define assertions, or to write results anywhere; the chain implied by the description is incomplete. Reason: An incomplete process leaves the agent guessing the missing steps, producing inconsistent elicitation output. Solution: Add ordered steps covering: derive assertions from each requirement, convert each unknown into a specific question, and persist the gaps/questions/assertions to the test plan.
🔵 Medium	Failure Handling	Problem: The prerequisite 'Test plan file exists' has no handling if the file is missing or empty. Reason: Reading a missing plan file yields no gaps and a silently empty result. Solution: Add a missing-input branch: if 'agents/plans/aqa-.md' is absent or empty, stop and report to the caller rather than proceeding.
🔵 Medium	Self-Validation	Problem: No verification step ensures the produced gap/question list actually covers all five completeness dimensions listed in step 2. Reason: Without a re-check the agent may answer only some dimensions and still finish. Solution: Add a re-check that each of the five analysis dimensions (steps clear, results measurable, data defined, edge cases, success criteria) was assessed and reflected in the output.

📊 Gates Comparison

Gate	Score	Comparison
Single Responsibility	4	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	1	❌ Much worse
Success Criteria	2	⬇️ Slightly worse
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	2	⬇️ Slightly worse
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	2	⬇️ Slightly worse
Precision & Explicitness	2	⬇️ Slightly worse
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	2	⬇️ Slightly worse
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	2	⬇️ Slightly worse
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	2	⬇️ Slightly worse
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/aqa-selector-management/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Failure Handling	Problem: Step 4 says to analyze page source HTML 'only when frontend code unavailable or selectors still missing', and the prerequisites note 'or will request page source', but there is no branch for when neither the frontend code nor the page source is available, leaving missing selectors unresolved. Reason: The top pitfall is 'Guessing selectors without verifying'; without a no-source branch the agent has no defined safe exit and may fabricate selectors. Solution: Add a branch: if selectors remain unidentified after both frontend search and page source are exhausted (or page source cannot be obtained), stop and report the unresolved selectors to the caller instead of guessing.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/aqa-test-authoring/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Epistemic Honesty	Problem: Nothing in the process tells the agent to flag low-confidence or assumed implementation choices. Step 5 'Follow action patterns from similar tests' and Step 6 'Use project assertion style' rely on inference, but there is no instruction to disclose when a pattern was guessed versus confirmed. Reason: Test authoring leans on inferred project conventions; undisclosed assumptions cause hard-to-trace test failures and reviewer rework. Solution: Add one line (e.g. to the validation step or output_format) requiring the agent to record any assumptions made (assumed selector, inferred pattern, unverified standard) so the reviewer can confirm them.
🔵 Medium	Failure Handling	Problem: The skill assumes its prerequisites (complete test plan, updated page objects, project standards) are always present. Step 1 'Consolidate from test plan' and Step 9 validation give no instruction for what to do if the test plan is missing assertions, a required page-object method does not exist, or project coding standards are unknown. Reason: Without explicit handling for missing inputs the agent will silently invent assertions or selectors, producing a test that does not match requirements. Solution: Add a gate at the start of the process (or to the prerequisites block) instructing the agent to stop and ask the user / route back to the prior phase when a listed prerequisite is missing (e.g. missing assertion, missing page-object selector, unknown coding standard) rather than fabricating implementation.
⚪ Low	Input Contract	Problem: The skill description and frontmatter do not begin with 'Rosetta' as the skill schema (docs/schemas/skill.md, 'description: ["Rosetta" + ...]') requires, and there is no <core_concepts> block carrying the schema-mandated 'All Rosetta prep steps MUST be FULLY completed, load-context skill loaded' line; is used instead, which is not a schema section. Reason: Minor schema-contract drift; does not break behavior but is inconsistent with sibling skills in the same family and the base schema. Solution: Prefix description with 'Rosetta', and either add the schema's <core_concepts> with the standard prep-steps line or confirm is an accepted family convention.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/aqa-test-debugging/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Epistemic Honesty	Problem: Step 5 'Identify Patterns and Root Causes' and the output_format 'Root Cause: [analysis]' do not require the agent to distinguish a verified root cause from a hypothesis. A confidently-stated but guessed root cause leads to a wrong fix in Part B. Reason: Failure triage frequently guesses; flagging confidence prevents applying fixes to misdiagnosed failures. Solution: Add a directive in Step 5 (or the output template) to mark each root cause as verified-from-evidence vs suspected, and to state what additional data would confirm a suspected cause.
⚪ Low	Input Contract	Problem: Description does not begin with 'Rosetta' and there is no <core_concepts> block with the schema-required 'All Rosetta prep steps MUST be FULLY completed, load-context skill loaded' line (docs/schemas/skill.md lines 4 and 56); is used instead. Reason: Schema-contract drift consistent with the sibling authoring skill; minor and non-behavioral. Solution: Prefix description with 'Rosetta' and add the standard <core_concepts> prep-steps line, or confirm is an accepted family convention.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/automation-test-execution-analysis/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Example Grounding	Problem: The failure categories in step 7 (environment, data, product regression, test bug, flakiness, infra timeout, auth/session, selector/locator, contract mismatch, unknown) are listed abstractly with no example of how a given log line maps to a category, and step 8 'tie to evidence' gives no worked example. Reason: Categorization is the core judgment of this skill; an example reduces inconsistent or ambiguous categorization across runs. Solution: Add one short worked example mapping a sample error string (e.g. 'TimeoutException on element visibility') to its category and the evidence snippet, mirroring the concrete pattern matching done in the sibling aqa-test-debugging skill.
🔵 Medium	Output Contract	Problem: Step 9 says 'Produce or update the parent workflow's analysis artifact (path and template from phase file)' but the skill defines no fallback structure or required fields for that artifact. Unlike its sibling aqa-test-debugging (which has a concrete output_format block), this skill has no schema or canonical example, so the categorized findings format is fully delegated and unverifiable from within the skill. Reason: Without any in-skill output shape, two runs can emit incompatible artifacts and the downstream correction phase cannot rely on a stable structure. Solution: Add a minimal required-field list for the analysis artifact (e.g. failure id, category, evidence reference, verified-vs-hypothesis flag, suggested owner file) as a default to use when the phase file omits a template, plus one short example row.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/automation-test-implementation-handoff/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟠 Very High	Reference Integrity	Problem: Both AQA Phase 6 (aqa-flow-test-implementation.md step 6.1) and QA Phase 5 (qa-flow-test-implementation.md step 5.1.4) hard-forbid ACQUIRing the domain test-authoring skills (aqa-test-authoring, qa-test-implementation) directly and delegate all authoring to this handoff skill. But the handoff never ACQUIREs those domain skills — its process step 4 only loads coding-agents-prompt-authoring plus 'any skill the parent names', and neither phase file names a domain skill. On the normal path the new aqa-test-authoring and qa-test-implementation skills are unreachable (orphaned). Reason: The PR added rich domain authoring skills but left no execution route to them, so the core content of the test-implementation phases is dead and authoring silently regresses to generic coding/testing. Solution: Add an explicit ACQUIRE/USE of the domain authoring skill inside the handoff keyed off a parent-supplied variable, and make each phase pass the name (AQA Phase 6 -> aqa-test-authoring; QA Phase 5 -> qa-test-implementation); OR relax the phase ban so the phase ACQUIREs the domain authoring skill directly.
🟡 High	Dependency Management	Problem: core_concepts and process step 4 force-load coding-agents-prompt-authoring — a meta-skill for authoring PROMPTS (skills/agents/workflows) — inside a skill whose job is to author automated TEST CODE. It is loaded on every test-implementation run. Reason: A mis-wired heavy meta-skill on the critical authoring path inflates context cost and biases the agent toward prompt edits rather than writing tests, lowering reliability of every Phase 5/6 run. Solution: Remove the mandatory coding-agents-prompt-authoring ACQUIRE/USE from core_concepts and process step 4; replace it with the actual domain test-authoring skill (aqa-test-authoring / qa-test-implementation). If prompt-authoring is genuinely needed, state in one phrase why a test-implementation phase authors prompts.
🔵 Medium	Single Responsibility	Problem: The skill mandates loading four skills in core_concepts and process (repository-implementation-standards, coding, testing, coding-agents-prompt-authoring) plus hitl. Bundling general coding, testing, repo-standards, and prompt-authoring under one handoff skill widens its responsibility beyond 'land approved tests and hand off execution'. Reason: Always loading four heavy skills inflates cost/context for every run and dilutes the skill's single boundary responsibility. Solution: Keep the handoff focused on orchestration + boundary; make coding/testing/standards conditional on the parent workflow rather than always-load, and drop coding-agents-prompt-authoring unless justified, so the skill does one job: implement-validate-handoff.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	2	⬇️ Slightly worse
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better

📄 `instructions/r3/core/workflows/qa-flow.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/qa-flow-api-spec-analysis.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Reference Integrity	Problem: Step 2.1 cites `qa-data-collection` skill "step 4 for full discovery logic" and step 2.2 hands the spec source to `swagger-contracts-analysis`. The phase relies on internal step numbers of a separate skill, which is sibling-internal knowledge that can drift if that skill is renumbered. Reason: Referencing another skill's internal step number couples the phase to that skill's structure and breaks reference integrity when the skill is edited. Solution: Replace the hard-coded "step 4" pointer with a behavior reference (e.g., "per the backend-source discovery logic in `qa-data-collection`") so the phase does not depend on another artifact's internal numbering.
🔵 Medium	Reference Integrity	Problem: Step 2.1 references the backend docs path as `RefSrc/{project-name}/docs/` (capitalized) in two places. The canonical Rosetta path term is lowercase `refsrc/` (per pa-rosetta.md folder list). On a case-sensitive Linux target repo, the capitalized path will not resolve to the real `refsrc/` directory. Reason: A wrong-case path can silently fail to find the backend architecture docs, causing the phase to skip a valid spec source and fall back to weaker code-only analysis. Solution: Change `RefSrc/{project-name}/docs/` to the canonical `refsrc/{project-name}/docs/` in both occurrences of step 2.1 so the path matches the predefined Rosetta target folder.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/qa-flow-data-collection.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/qa-flow-documentation-mcp-subflow.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Precision & Explicitness	Problem: Step 9 reads "USE SKILL with the Resolved MCP collection skill tag" and step 7 reads "USE SKILL `confluence-source-harvesting`". The Resolved MCP collection skill is a runtime-resolved variable, but step 9 uses the bare `USE SKILL` alias without making clear it must substitute the resolved tag, unlike the literal skill name in step 7. An agent could misread step 9 as a missing skill name. Reason: An ambiguous `USE SKILL` with no resolvable name can cause the agent to stall or pick the wrong skill at the actual collection step, which is the core action of the subflow. Solution: Reword step 9 to make the substitution explicit, e.g., "USE SKILL <Resolved MCP collection skill from step 1>", matching the explicit-variable convention already used in the output-contract COMPLETED row.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/qa-flow-execution-and-report-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/qa-flow-gap-and-requirements-clarification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/qa-flow-project-config-loading.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Decision Branching	Problem: The conditional HITL hinges on "only if config does not already exist" (step 0.1 item 4, header `type=HITL-CONDITIONAL`), but the file never defines the if/then/else for the alternate branch: what happens when config DOES exist (load it, validate freshness, ask if stale?) versus when it does not (collect from user). The existence check that drives the branch is also not specified (which path/file is probed). Reason: An unspecified branch lets the agent ask the user redundantly when config exists, or skip collection when it is missing, breaking the conditional-HITL contract. Solution: Make the branch explicit: if `qa-project-config.md` exists and non-empty THEN load and proceed without asking; ELSE ask user for project info and create it. State where existence is checked (e.g., `agents/qa/{IDENTIFIER}/qa-project-config.md`).
🟡 High	Failure Handling	Problem: Unlike its sibling phase files (Phase 3, 4, 5, 7) which all have a dedicated `<failure_handling>` block, this Phase 0 file has none. It does not say what to do when the `qa-project-config` skill ACQUIRE returns zero documents, when the session directory cannot be created, or when the user refuses to supply required project info. The parent `qa-flow.md` `<failure_handling>` covers zero-doc ACQUIRE generically, but the phase-local edge cases (directory creation failure, user declines mandatory config) are unhandled. Reason: Phase 0 is the foundation for all later phases; an unhandled config-collection failure can produce an empty or fabricated config that silently corrupts every downstream phase. Solution: Add a `<failure_handling>` block covering: (a) `qa-project-config` zero-doc ACQUIRE (defer to parent zero-doc rule), (b) directory creation failure under `agents/qa/{IDENTIFIER}/`, and (c) user refuses or cannot provide required project info when config is absent (stop, record blocked state, do not fabricate config).
🔵 Medium	Epistemic Honesty	Problem: Step 0.1 loads or collects project config (Swagger availability, base URLs, auth scheme, spec locations) but never requires the agent to flag values that were assumed, inferred, or supplied with low confidence. update_state step 0.2 records a coarse 'Config Source: [Existing / User provided / Discovered]' label but there is no instruction to mark individual fields ASSUMED/UNVERIFIED when inferred rather than confirmed. Reason: Phase 0 config feeds every downstream phase; silently recording inferred endpoints/auth as confirmed hides guesses behind a confident-looking config and corrupts data collection and spec analysis with no audit trail. Solution: Add one line to step 0.1 or 0.2: when any required config field is inferred or uncertain (not confirmed by the user or read from a spec), mark it ASSUMED in qa-project-config.md and surface it to the user before Phase 1 rather than recording it as confirmed.
🔵 Medium	Safety Boundaries	Problem: Step 0.1 item 4 "ASK USER for project info only if config does not already exist" has no guard against fabricating config when the user provides incomplete or no answers. Compared with sibling phases that explicitly forbid fabrication and silent-bypass, Phase 0 has no such boundary. Reason: Fabricated project config at Phase 0 propagates wrong assumptions into data collection, spec analysis, and implementation, causing systemic downstream failure. Solution: Add a boundary: do not invent endpoints, base URLs, auth schemes, or spec locations; if the user omits required fields, record the gap and stop rather than guessing.
🔵 Medium	Output Contract	Problem: `<workflow_context>` and `<validation_checklist>` say the outputs are `initial-data.md` AND `qa-project-config.md`, but `update_state` step 0.2 "Files Created" lists only `initial-data.md, qa-state.md` — it omits `qa-project-config.md` and adds `qa-state.md` (which is the global state file, not a per-session output). The recorded file list contradicts the declared outputs. Reason: An inconsistent file ledger makes the skip-gate verification in the parent flow (which checks for expected artifacts) unreliable and can wrongly pass or fail resumption checks. Solution: Align step 0.2 "Files Created" with the declared outputs: list `initial-data.md` and `qa-project-config.md`; track `qa-state.md` separately as the state file, not a session output artifact.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/qa-flow-test-case-specification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/qa-flow-test-correction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/qa-flow-test-implementation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/confluence-source-harvesting/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/gap-and-contradiction-analysis/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Bloat Control	Problem: The `<common_patterns>` section (Typical Contradictions / Typical Gaps / Typical Ambiguities) largely restates examples already given inside `<identify_contradictions>`, `<identify_gaps>`, and `<identify_ambiguities>` (e.g. 'Fast response (how fast?)' duplicates the 'fast' example in identify_ambiguities; 'Owner/assignee conflicts' duplicates the Owner value-mismatch example). Reason: Duplicated example lists add tokens to every load without adding decision value and dilute the single canonical place an agent looks for each category. Solution: Remove `<common_patterns>` or fold any non-duplicated entries into the relevant identify_* section so each example appears once.
🔵 Medium	Self-Validation	Problem: The `<process>` ends at step 6 'Assess risk and produce findings' and `<output_format>` produces the document, but there is no step telling the agent to re-check its own output before finishing (e.g. confirm every finding has an exact source quote, confirm all four sections are populated, confirm IDs C/G/A are sequential and referenced in the Risk Assessment). `<analysis_guidelines>` and `<pitfalls>` state the rules but no explicit self-verification pass is required. Reason: Without an explicit re-check step the agent may emit findings lacking quotes or unreferenced IDs, which the downstream requirements/test phases cannot act on. Solution: Add a final process step or a `<validation_checklist>` requiring the agent to verify before output: each contradiction/gap/ambiguity carries an exact quote and source, every Risk Assessment entry references an existing finding ID, and all six output sections exist even when empty.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/mcp-confluence-data-collection/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Self-Validation	Problem: The skill has no step asking the agent to verify its output before finishing (confirm child pages checked per `<pitfalls>`, confirm truncation noted, confirm space/URL/labels populated). The pitfalls list states 'always check with get_page_children' but nothing in `<process>` requires confirming it happened. Reason: Child-page detail often holds acceptance criteria; without a verification step the agent can silently omit it and the downstream gap-analysis sees a false-complete dataset. Solution: Add a closing verification step in `<process>` or a checklist that re-checks: child pages fetched for each parent, truncation annotated, and the output template fields filled.
🔵 Medium	Success Criteria	Problem: There is no explicit 'done when' statement. The skill lists a `<process>` and an `<output_format>` but never states the completion condition (e.g. every retrieved parent had its children checked, truncation flagged where applied, fallback recorded when zero results). The companion skill confluence-source-harvesting has a `<validation_checklist>`; this MCP skill has none. Reason: Without testable completion criteria the agent may stop after fetching one page and skip child-page retrieval or truncation flagging, producing incomplete raw data for downstream phases. Solution: Add a short success-criteria or validation-checklist block: e.g. 'Done when each parent page has children checked, all pages over the word budget are flagged truncated, and a zero-result run ends with a recorded user decision or noted gap.'

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/mcp-jira-data-collection/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Self-Validation	Problem: No output-verification step. `<pitfalls>` notes some fields may be permission-restricted and that rendered HTML may need markdown conversion, but `<process>` never requires the agent to confirm description was converted or that restricted fields were flagged rather than dropped. Reason: Silently dropped or unconverted fields produce a normalized artifact that misleads downstream gap-analysis and test generation. Solution: Add a closing verification step: re-check that the description is in markdown, restricted/empty fields are explicitly marked, and custom fields discovered via `jira_search_fields` are reflected in the output.
🔵 Medium	Success Criteria	Problem: No explicit completion condition. The `<process>` extracts fields and a `<fallback>` handles a missing ticket, but nothing states when extraction is considered done (e.g. all template fields populated or marked N/A, restricted fields noted rather than left blank). The output template lists 'Acceptance criteria' as a concept in the sibling qa-data-collection skill but this skill's template has no done-when. Reason: Without testable done-criteria the agent may stop after fetching summary/description and omit comments or custom fields that downstream phases need. Solution: Add a brief success-criteria/checklist: 'Done when every output-template field is populated or explicitly marked N/A, permission-restricted fields are labeled, and the ticket key + URL are present.'

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/mcp-testrail-data-collection/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Failure Handling	Problem: Unlike the Jira skill (which has a `<fallback>` for 'ticket not found') and the Confluence skill (which has a 'no results' fallback), this skill's `<process>` has no failure branch. Step 2 'Call TestRail MCP (`get_case` with case_id)' has no handling for case-not-found, MCP unreachable, or access denied. The only related guidance is the pitfall 'Some fields may be empty — document gaps', which covers empty fields but not a failed fetch. Reason: With no failed-fetch branch the agent has no defined behavior on MCP error and may hallucinate test-case content or silently produce an empty artifact. Solution: Add a fallback step mirroring the Jira skill: if `get_case` returns not-found or an error, verify the case ID/URL with the user and stop with a recorded gap rather than fabricating a case.
🔵 Medium	Self-Validation	Problem: No output-verification step. `<pitfalls>` says 'Some fields may be empty — document gaps, never assume content' but `<process>` step 4 just 'Output structured test case artifact' with no re-check that each step has an expected result and that empty fields were marked rather than invented. Reason: Test cases without per-step expected results or with invented content are unusable and risky for downstream test design. Solution: Add a closing verification step requiring the agent to confirm each test step has a paired expected result and that all empty fields are explicitly marked as gaps before emitting the artifact.
🔵 Medium	Success Criteria	Problem: No explicit completion condition. The skill does not state when extraction is done (e.g. all template fields populated or marked as gap, steps and expected results captured per step). Reason: Without testable criteria the agent may stop after capturing the title and skip per-step expected results that downstream test generation depends on. Solution: Add a short done-when/checklist: 'Done when case ID/title/section/steps/expected results are captured, every empty field is marked as a gap, and custom fields are included or noted unavailable.'
🔵 Medium	Decision Branching	Problem: The `<process>` is a flat 1-4 sequence with no conditional handling: no if/then for missing ID (only `<prerequisites>` says 'ask if missing'), no branch for fetch failure, and no branch for empty/partial case data. The sibling Confluence and Jira skills both carry explicit conditional branches inside `<process>`. Reason: A flat sequence with no branches gives the agent no instruction for the common error and empty-data paths, lowering reliability versus the companion MCP skills. Solution: Add explicit conditionals to `<process>`: if no ID/URL then ask; if `get_case` fails then verify and stop with gap; if fields empty then mark gap and continue.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/qa-data-collection/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Single Responsibility	Problem: This skill bundles five distinct jobs in one `<process>`: retrieve test cases (sec 2), search documentation (sec 3), analyze backend source code with framework detection across Spring/Express/FastAPI/.NET (sec 4), discover existing test patterns (sec 5), and produce the raw-data document (sec 6). Section 4 alone carries deep framework-marker and route-decorator knowledge (`@GetMapping`, `router.get()`, `@app.get()`, `[HttpGet]`) for four stacks. Reason: Five responsibilities in one skill enlarge the cognitive search space and make the file harder to maintain and reuse; the framework knowledge in sec 4 also belongs behind a discovery step rather than baked into the collector. Solution: Keep this skill as the orchestrator/aggregator that delegates via the existing `USE SKILL` calls, and extract the backend-source-analysis and existing-test-pattern-discovery detail (secs 4-5) into a dedicated skill it references, leaving this file to sequence and assemble the raw-data artifact.
🔵 Medium	Self-Validation	Problem: The skill produces a rich raw-data template (Data Collection Summary with counts of test cases, docs, endpoints, test files) but `<process>` never asks the agent to verify the summary counts match what was actually collected, or that endpoints in the table trace to a source (TestCase/Docs/Code). No verification step closes the process. Reason: Unverified summary counts and untraced endpoints give downstream gap-analysis and test generation a misleadingly complete picture. Solution: Add a final verification step before emitting raw-data.md: confirm each summary count equals the items collected, each API-endpoint row cites its source, and skipped sections (e.g. backend analysis) are marked N/A rather than omitted.
🔵 Medium	Cognitive Budget	Problem: The skill is the largest of the six (process spans secs 1-6 with multi-level numbered subtrees plus a full output template ~100 lines). Section 4's backend-source priority logic (3 path-resolution sources, RefSrc docs reading, framework markers for 4 stacks, Repomix XML vs source-dir branching) is a single step block an agent must hold while also doing secs 2,3,5,6. Reason: A long multi-branch process loaded at once raises the risk the agent skips sub-steps (e.g. reading RefSrc docs before grepping source), which the Rosetta reliability goal warns against. Solution: Decompose by moving secs 4-5 behind a referenced skill (as above) or split the process into two loadable parts (collection vs codebase analysis) so each delivered chunk is closer to the ~5-step reliable window.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/qa-gap-analysis/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Failure Handling	Problem: require raw-data.md and api-analysis.md to exist, but the gives no handling if these files are missing, empty, or unreadable. The skill assumes they are present. Reason: If api-analysis.md is missing, the cross-reference table in step 1 produces meaningless results, silently degrading the whole chain. Solution: Add a step-0 check: if a prerequisite artifact is missing or empty, stop and ask the user or escalate to the orchestrator rather than cross-referencing against absent data.
🔵 Medium	Success Criteria	Problem: The skill has no explicit testable done-condition. The ends at step 5 (prepare questions) and <output_format> defines the document, but nothing states when gap analysis is considered complete (e.g. every test step cross-referenced, every gap categorized, all critical questions resolved or recorded as assumptions). Reason: Without a done-condition the agent may stop early after partial cross-referencing and proceed to test specification with unresolved gaps. Solution: Add a short success-criteria block stating the skill is done when every test step has a cross-reference entry, all gaps/contradictions/ambiguities are documented with IDs, and all Critical questions are either answered or recorded as assumptions in analysis.md.
⚪ Low	Self-Validation	Problem: There is no output verification step. The agent is not told to re-check that every test step produced a cross-reference row, or that question counts in the Executive Summary match the questions actually listed. Reason: Self-check reduces the risk of the documented summary counts diverging from the actual content. Solution: Add a brief validation checklist: counts in Executive Summary match the documented gaps/contradictions/ambiguities/questions, and no test step is left without a cross-reference entry.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	3	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	3	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	3	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/qa-project-config/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Reference Integrity	Problem: The config location term is inconsistent and undefined. Step 3 says find 'qa-project-config.md in the repo's agent-specific directory', step 5 says save to '<agent_folder>/qa-project-config.md', and step 6 writes to 'agents/qa/{IDENTIFIER}/'. The terms 'agent-specific directory' and '<agent_folder>' are never resolved to a concrete path, so read (step 3) and write (step 5) may target different locations. Reason: If step 3 looks in one place and step 5 writes to another, the 'config not found' branch fires on every run, re-asking the user even when a valid config exists. Solution: Define one operational term for the config directory once (e.g. resolve to the same concrete path used for state, such as agents/), and use that identical term in both the load step (3) and the save step (5) so the file is read from and written to the same place.
🔵 Medium	Precision & Explicitness	Problem: Step 3's branch keys off 'found and non-empty' vs 'not found' but does not handle a found-but-incomplete config (e.g. an existing file missing the minimum fields validated in step 4). The vague directory term ('agent-specific directory') also leaves the search scope ambiguous. Reason: An existing partial config would currently be accepted as valid, leaving later phases without required information such as Swagger availability. Solution: Make the directory path concrete and add an explicit branch for an existing-but-incomplete config: if found but missing minimum required fields, ask only for the missing fields rather than skipping to step 5.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	3	⬆️ Slightly better
Reference Integrity	3	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/qa-test-debugging/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/qa-test-implementation/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Reference Integrity	Problem: The skill repeatedly couples itself to external workflow phase numbers it does not own: prerequisites cite 'User approval from Phase 4 received', step 2 says utilities 'identified in Phase 4', and step 6 says 'All assertions from Phase 4 included'. A skill must not depend on a sibling workflow's phase numbering, and if the workflow is renumbered these references break silently. Reason: Phase-number references make the skill fragile to workflow changes and violate skill/workflow isolation; tying to the named artifact keeps the dependency stable. Solution: Replace 'Phase 4' references with the artifact or input they actually depend on (e.g. 'the approved test-specs.md' / 'the shared-utilities plan in test-specs.md'), since the skill already names test-specs.md as the source of the file mapping and assertions.
🔵 Medium	Output Contract	Problem: Unlike the sibling QA skills, this skill has no <output_format> section. It describes code to write and a validation checklist, but does not state the deliverable artifact or where implementation results/summary are recorded for the next phase (debugging). Reason: A defined handoff artifact lets the downstream debugging skill reliably locate what was implemented; its absence forces re-discovery. Solution: Add a short output contract naming the produced test files/utilities and any summary artifact (e.g. list of created/modified files recorded in the phase artifact) that the test-debugging phase will consume.
🔵 Medium	Failure Handling	Problem: require an approved test-specs.md and identified existing patterns, but the has no handling if test-specs.md is missing/unapproved or if no existing test framework/patterns can be found. Step 1 'consolidate from previous phases' assumes all inputs exist. Reason: Without these guards the agent may scaffold tests against an unconfirmed framework or incomplete specs, producing throwaway code. Solution: Add a guard: if test-specs.md is missing or not approved, stop and request it; if no existing test framework/pattern is discoverable, ask the user which framework to use instead of guessing.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	3	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	3	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/repository-implementation-standards/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/requirements-synthesis/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Failure Handling	Problem: The skill assumes multi-source data exists. say 'Collected raw data from at least one source', but the step 1 'Load all source data' has no handling for the case where a source file is missing, empty, or only a single thin source is available (which would make traceability and conflict resolution near-empty). Reason: Synthesizing from missing or empty sources would silently produce a hollow requirements document that looks complete but is not grounded in any source. Solution: Add a guard at step 1: if no source data is loadable, stop and request inputs; if only partial sources exist, proceed but record the missing sources as risks/assumptions so the gaps are visible in the output document.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	3	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/sequential-workflow-execution/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Structural Coherence	Problem: The process list has two steps labeled 9 (step 9 "If the user requests skipping a phase..." followed immediately by step 9a), then jumps to step 10. The 9 / 9a numbering is inconsistent with the otherwise sequential integer numbering and makes ordering ambiguous. Reason: Ambiguous step numbering in a process that the skill itself enforces as strictly ordered can cause an agent to mis-sequence or skip the sub-step. Solution: Renumber so each process item has a unique sequential identifier (e.g., make 9a into step 10 and shift the subagent-dispatch step to 11), or clearly mark 9a as a labeled sub-step of 9 rather than a sibling integer.
🔵 Medium	Cognitive Budget	Problem: Step 9a ("Verification-failure unilateral start") is a single ~200-word paragraph packing the trigger condition, a one-line announcement format, an embedded non-exhaustive MUST NOT list (AskUserQuestion, menus, confirmation phrasings), the "same turn" requirement, and an exception clause. This is far denser than the other one-line steps and risks the agent dropping a sub-directive when scanning. Reason: Briefing notes agents reliably handle a bounded number of directives at once; a wall-of-text gate hides sub-rules and reduces reliable compliance. Solution: Decompose step 9a into a labeled sub-block with discrete bullets: (a) trigger, (b) required one-line announcement format, (c) the MUST NOT list as its own bulleted set, (d) the only acceptable user input. Keep wording; only restructure into atomic items.
⚪ Low	Bloat Control	Problem: Step 9a repeats the same idea (no AskUserQuestion / no menu / no confirmation request / no pause) several times with near-synonymous phrasings within one paragraph. Reason: Repetition adds tokens to a permanently-loaded skill without adding new behavioral information. Solution: State the prohibition once as a single bulleted list of forbidden actions; drop the repeated restatements.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	3	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	3	⬆️ Slightly better
Cognitive Budget	3	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/swagger-contracts-analysis/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Self-Validation	Problem: Unlike the other QA skills in this set (testrail-test-case-export, sequential-workflow-execution), this skill has no validation_checklist and no output-verification step. There is no instruction to confirm all target endpoints were covered or that extracted contracts are internally consistent. Reason: An extraction with silent gaps propagates 401/403/404 failures into downstream test design, which the pitfalls themselves warn about but provide no verification gate to catch. Solution: Add a validation_checklist (e.g., every target endpoint has a contract, auth determined per endpoint, data dependencies and creation order captured, code cross-checked against spec where both exist).
🟡 High	Output Contract	Problem: The skill extracts endpoint contracts, auth requirements, and data dependencies but never specifies the shape of its output. when_to_use_skill says "the calling workflow determines ... where to write outputs," yet there is no schema, structure, or canonical example of what an extracted endpoint contract looks like (field set, format, grouping). The detailed bullet lists in step 2 describe what to look for, not the format to emit. Reason: Without a defined output shape, two runs can produce divergent structures, breaking workflows that consume the extracted contract. Solution: Add an output_contract / output_format section giving one canonical example of an extracted endpoint contract (e.g., a markdown or structured block with path, method, params, request/response schema, security, data dependencies) so downstream phases get a deterministic artifact.
🔵 Medium	Example Grounding	Problem: The skill gives concrete framework pattern hints (e.g., router.get(), @GetMapping, [HttpGet]) but provides no example of a completed extraction for any single endpoint, so the abstract instruction set is not grounded in a worked output. Reason: A worked example anchors the expected granularity and format, reducing variance across runs. Solution: Include one short worked example showing an input endpoint definition and the resulting extracted contract.
🔵 Medium	Failure Handling	Problem: Failure handling is limited to step 1 step-4 ("If none found: report back ... request user input"). There is no guidance for partial discovery (spec found but a target endpoint missing from it), spec-vs-code conflicts (only listed as a pitfall), or malformed/unreachable spec. Reason: These are common real-world conditions; without if/then handling the agent may silently emit incomplete or contradictory contracts. Solution: Add explicit branches: endpoint present in spec but absent in code (and vice versa) -> flag discrepancy; spec unreachable/malformed -> fall back to code analysis and note degraded confidence; target endpoint not found anywhere -> report which endpoints are unresolved.
🔵 Medium	Success Criteria	Problem: There is no explicit "done when X, Y, Z" for the analysis. The process ends after step 4 (data dependencies) with no statement of what constitutes a complete, accepted analysis. Reason: Without testable completion criteria the agent cannot reliably decide when to stop or hand back. Solution: Add explicit success criteria, e.g., done when all target endpoints have contract + auth + data-dependency entries and any unresolved gaps are reported to the calling workflow.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	2	⬇️ Slightly worse
Success Criteria	2	⬇️ Slightly worse
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	2	⬇️ Slightly worse
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/testrail-test-case-authoring/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Failure Handling	Problem: There is no guidance for inputs that don't fit the template: e.g., a requirement with no clear acceptance criterion for Traceability, more than 5 parameter combinations (pitfalls say split but the process doesn't state when/how to decide), or a test that is hard to express as single-action steps. Reason: Without explicit handling the author may silently drop required fields or overload a single case, degrading downstream export. Solution: Add brief failure/edge handling: when >5 parameter sets are needed, state the split rule in the process not just pitfalls; when traceability fields are unknown, mark them explicitly rather than omitting.
🔵 Medium	Self-Validation	Problem: The skill defines strict format_rules (MUST use Steps + Expected Results, MUST NOT use BDD, MUST NOT include Post-conditions/Automation, steps numbered, expected results reference their step) but provides no validation_checklist to self-verify an authored test case against those rules before completion. Reason: A self-check gate catches format violations the author skill itself declares mandatory, before they reach export. Solution: Add a short validation_checklist mirroring the format_rules (e.g., no Given-When-Then; each step is a single action; every expected result names the step it follows; no Post-conditions/Automation fields; <=5 parameter sets).

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/testrail-test-case-export/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Safety Boundaries	Problem: This skill performs external write actions (mcp_testrail_add_case creates cases in a live TMS). Step 7 exports each case and the pitfalls note "Re-running export creates duplicate test cases in TestRail (by design, preserves history)." There is no pre-export confirmation gate showing the user how many cases will be created into which section_id before the write loop starts, so an accidental re-run silently mass-creates duplicates. Reason: Bulk creation into a shared external system is hard to reverse; without a count/destination confirmation an accidental re-run pollutes the TMS with duplicates. Solution: Add a confirmation gate before step 7: state the case count and target section_id, optionally use mcp_testrail_get_cases to detect likely duplicates, and require user acknowledgement before the write loop (or explicitly state the parent workflow owns this gate).

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/user-approved-code-changes/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Example Grounding	Problem: Step 4 requires presenting "each proposed change with before/after snippets and file paths; batch if small, otherwise chunk for review," but no example shows the expected before/after presentation format, and no concrete example of an accepted vs rejected approval phrase is given (it defers entirely to skill hitl). Reason: The before/after format and the approval-token discrimination are the load-bearing parts of an approval gate; an example reduces variance in how the gate is presented and judged. Solution: Add one short illustrative before/after presentation block and a positive/negative approval-phrase example (or an explicit pointer that the exact token set comes from the parent workflow / hitl), so the agent renders the gate consistently.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow-data-collection.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Example Grounding	Problem: The NEW file deleted the concrete operational examples that BASE had for the keyword-search branch: the example CQL query (`type=page AND space={PROJECT_KEY} AND (text ~ "{term1}" OR text ~ "{term2}")`), the result-ranking guidance, and the worked parent/child example (`Parent: "Job Post" / Children: "Create a Job Post"...`). Step 4 in NEW now only says `Extract search terms` and `Retrieve relevant Confluence pages` with no example of how a query is shaped or how children are traversed. Reason: The abstract instruction `Retrieve relevant Confluence pages` is harder to execute reliably without the concrete query example BASE provided; for an agent that fails to load the referenced skill, no grounding remains. Solution: Keep the search-term extraction and child-page traversal delegated to the skill, but add one short grounded example (a single CQL line and the parent/child illustration) inside `<get_confluence>` step 4, OR add an explicit note that `confluence-source-harvesting` owns the query-shape and child-traversal examples so the reader knows where to find them.

📊 Gates Comparison

Gate	Score	Comparison
Input Contract	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	3	⬇️ Slightly worse
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow-gap-and-contradiction-analysis.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Output Contract	Problem: The NEW file deleted the complete self-contained `analysis.md` schema that BASE defined (Executive Summary, sections 1 Contradictions, 2 Gaps, 3 Ambiguities, 4 Cross-Reference, 5 Positive Findings, 6 Risk Assessment, plus the per-item C1/G1/A1 record formats). NEW `<create_analysis_document>` now says only `using the output format from the skill, with the following testgen-specific additions` and shows just section 7 (Next Steps) and Analysis Metadata. The phase's own output contract is now incomplete and fully dependent on `gap-and-contradiction-analysis` defining all those sections. Reason: An agent that loads this phase but whose skill output differs from BASE's structure can produce an analysis.md missing contradictions/gaps/risk sections, breaking the downstream Phase 3 question generation that consumed those sections. Solution: Do not re-inline the full schema, but make the dependency explicit and verifiable: in `<create_analysis_document>` state which top-level sections the skill must produce (contradictions, gaps, ambiguities, cross-reference, risk assessment) so the contract is checkable even if the skill output drifts, and confirm the `gap-and-contradiction-analysis` skill actually emits sections 1-6 in that order.
🔵 Medium	Example Grounding	Problem: BASE grounded the analysis with concrete `Be Specific` good/bad examples (e.g. bad `Some details missing` vs good `User authentication method not specified (OAuth, SAML, basic auth?)`) and typical-contradiction/gap/ambiguity examples. NEW deleted all of these; `<run_analysis>` step 3 now just says `Identify contradictions, gaps, ambiguities` with no positive/negative example. Reason: Without a specificity example the agent is more likely to emit vague findings like `Some details missing`, which is exactly the failure BASE's example warned against. Solution: Either confirm the `gap-and-contradiction-analysis` skill carries the specificity good/bad example, or add one short negative+positive example pair to `<run_analysis>` so the agent has a calibration anchor for what counts as a specific finding.

📊 Gates Comparison

Gate	Score	Comparison
Output Contract	3	⬇️ Slightly worse
Decision Branching	5	✅ Much better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	3	⬇️ Slightly worse
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow-project-config-loading.md`

⚠️ Issues Found

Severity	Gate	Details
⚪ Low	Structural Coherence	Problem: In `<obtain_project_info step="0.4">` the numbered list has two items numbered `2`: step `2. Ask user to confirm or customize the data retrieval process` appears after the `1. ACQUIRE questioning/SKILL.md` and the example-format block which is itself numbered `2`. The duplicate ordinal makes the step sequence ambiguous. Reason: Duplicate step numbers can cause the agent to skip or merge a step in a sequential phase, though impact is low because the actions remain individually clear. Solution: Renumber the steps in `<obtain_project_info>` so each has a unique ordinal (ACQUIRE skill = 1, ask question with example = 2, confirm/customize = 3, validate = 4, save = 5).

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow-question-generation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/testgen-flow-requirements-document-generation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/testgen-flow-test-case-export.md`

⚠️ Issues Found

Severity	Gate	Details
🟠 Very High	Workflow Completeness	Problem: The parent testgen-flow.md declares export-report.md the on-disk evidence that Phase 6 ran and lists it in output_directory; this phase's validation_checklist (line 83) requires it with TMS IDs/URLs, per-case status, and timestamp. But the success-path step update_documents 6.6 only writes test-scenarios.md and testgen-state.md — it never writes export-report.md. The file is written only inside the step 6.2 fallback branches (manual/CSV/defer). On the normal TMS-export path the required deliverable is never created. Reason: A deliverable required by the validation checklist and by the parent flow's skip/evidence logic has no creation step on the main path, so an agent can mark Phase 6 complete (and a later run's skip-gate can pass) without the evidence file the chain depends on. Solution: Add an explicit instruction in step 6.6 to create agents/testgen/{TICKET-KEY}/export-report.md on the success path, populated from the per-case results tracked in step 6.5 (TMS IDs/URLs, status, timestamp), and make the step-6.2 fallbacks write into that same file so every branch has one owning step.
🟡 High	Output Contract	Problem: The base file specified a concrete CSV column order and a canonical TestRail export-summary table example. The new file keeps a CSV column list only inside the fallback branch of step 6.2 (line 45) and gives no canonical example of the `export-report.md` content (the primary deliverable), only a prose list of required fields in the checklist (line 83). Reason: Without a canonical example for the main deliverable, the report layout is left to agent interpretation, reducing determinism of the Phase 6 evidence file. Solution: Add one short canonical `export-report.md` skeleton (target info + a TC-to-TMS-ID result table) near step 6.6, mirroring the field list already required by the checklist, so the output shape is deterministic without depending on a fallback branch.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	3	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow-test-case-generation.md`

⚠️ Issues Found

Severity	Gate	Details
⚪ Low	Safety Boundaries	Problem: The base validation checklist contained explicit negative format constraints — `NO BDD format (Given-When-Then)`, `NO Post-conditions field`, `NO Automation field` — directly inside this phase. The new file removes all three negative constraints; the inline TC schema (lines 86-120) and checklist (lines 253-262) state only positive fields and rely on the `testrail-test-case-authoring` skill for the format ban. Reason: The deleted negative constraints are enforced by the mandatory skill, but the inline fallback template used when the skill is unavailable no longer restates them, so the format ban is not self-contained in the documented degraded path. Solution: No change required for correctness: verified that `testrail-test-case-authoring/SKILL.md` (lines 19-21, 192) carries the MUST NOT BDD / MUST NOT Post-conditions / MUST NOT Automation constraints, and step 5.3 mandates USE SKILL `testrail-test-case-authoring`. The constraint is preserved via delegation. Optionally add a one-line reminder in the inline-fallback note (line 87) that the self-contained template must also avoid BDD/Post-conditions/Automation, since that template is the explicit fallback when the skill is unavailable.

📊 Gates Comparison

Gate	Score	Comparison
Output Contract	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	⬆️ Slightly better

github-actions · 2026-06-02T13:05:14Z

📋 Prompt Quality Validation Report

❌ Validation Failed

Summary by File

File	🔴 Critical	🟠 Very High	🟡 High	🔵 Medium	⚪ Low	Status
`instructions/r3/core/skills/api-test-spec-authoring/SKILL.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/api-test-spec-authoring/references/templates-and-redaction.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/aqa-codebase-analysis/SKILL.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/aqa-codebase-analysis/references/report-template.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/aqa-requirements-elicitation/SKILL.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/aqa-selector-management/SKILL.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/aqa-selector-management/references/strategy-and-template.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/aqa-test-authoring/SKILL.md`	0	0	0	1	0	⚠️ Warning
`instructions/r3/core/skills/aqa-test-authoring/references/test-implementation-template.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/aqa-test-debugging/SKILL.md`	0	0	1	2	0	❌ Fail
`instructions/r3/core/skills/aqa-test-debugging/references/escalation-template.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/aqa-test-debugging/references/part-b-mechanics.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/automation-test-execution-analysis/SKILL.md`	0	0	1	1	0	❌ Fail
`instructions/r3/core/skills/automation-test-implementation-handoff/SKILL.md`	0	0	1	1	0	❌ Fail
`instructions/r3/core/skills/confluence-source-harvesting/SKILL.md`	0	0	0	1	0	⚠️ Warning
`instructions/r3/core/skills/confluence-source-harvesting/references/redaction-and-normalization.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/gap-and-contradiction-analysis/SKILL.md`	0	0	1	0	0	❌ Fail
`instructions/r3/core/skills/gap-and-contradiction-analysis/references/entry-templates-and-document-skeleton.md`	0	0	1	0	0	❌ Fail
`instructions/r3/core/skills/mcp-confluence-data-collection/SKILL.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/mcp-confluence-data-collection/references/cql-and-redaction.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/mcp-confluence-data-collection/references/vendor-swap.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/mcp-jira-data-collection/SKILL.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/mcp-jira-data-collection/references/vendor-swap.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/mcp-testrail-data-collection/SKILL.md`	0	0	0	1	0	⚠️ Warning
`instructions/r3/core/skills/qa-data-collection/SKILL.md`	0	0	1	0	0	❌ Fail
`instructions/r3/core/skills/qa-data-collection/references/backend-source-analysis.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/qa-gap-analysis/SKILL.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/qa-project-config/SKILL.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/qa-test-debugging/SKILL.md`	0	0	2	0	0	❌ Fail
`instructions/r3/core/skills/qa-test-implementation/SKILL.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/qa-test-implementation/references/multi-language-examples.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/repository-implementation-standards/SKILL.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/requirements-synthesis/SKILL.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/requirements-synthesis/references/output-schemas.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/sequential-workflow-execution/SKILL.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/swagger-contracts-analysis/SKILL.md`	0	0	1	0	0	❌ Fail
`instructions/r3/core/skills/swagger-contracts-analysis/references/canonical-example.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/swagger-contracts-analysis/references/redaction-catalog.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/testrail-test-case-authoring/SKILL.md`	0	0	1	0	0	❌ Fail
`instructions/r3/core/skills/testrail-test-case-export/SKILL.md`	0	0	1	0	0	❌ Fail
`instructions/r3/core/skills/user-approved-code-changes/SKILL.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-code-analysis.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-data-collection.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-requirements-clarification.md`	0	0	0	1	2	⚠️ Warning
`instructions/r3/core/workflows/aqa-flow-selector-identification.md`	0	0	1	0	0	❌ Fail
`instructions/r3/core/workflows/aqa-flow-selector-implementation.md`	0	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/aqa-flow-test-correction.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-test-implementation.md`	1	0	1	2	0	❌ Fail
`instructions/r3/core/workflows/aqa-flow-test-report-analysis.md`	0	0	0	0	1	⚠️ Warning
`instructions/r3/core/workflows/aqa-flow.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/adhoc-flow.md`	0	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow-test-report-analysis.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-api-spec-analysis.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-data-collection.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-documentation-mcp-subflow.md`	0	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/qa-flow-execution-and-report-analysis.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-gap-and-requirements-clarification.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-project-config-loading.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-test-case-specification.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-test-correction.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-test-implementation.md`	1	1	0	0	0	❌ Fail
`instructions/r3/core/workflows/qa-flow.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-data-collection.md`	0	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/testgen-flow-gap-and-contradiction-analysis.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-project-config-loading.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-question-generation.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-requirements-document-generation.md`	0	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/testgen-flow-test-case-export.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-test-case-generation.md`	0	0	0	0	0	✅ Pass

📄 `instructions/r3/core/skills/api-test-spec-authoring/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/api-test-spec-authoring/references/templates-and-redaction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/aqa-codebase-analysis/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/aqa-codebase-analysis/references/report-template.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/aqa-requirements-elicitation/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/aqa-selector-management/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/aqa-selector-management/references/strategy-and-template.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/aqa-test-authoring/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Bloat Control	Problem: The done-condition is stated three times in near-identical prose: process step 5 ('The skill is complete after step 5 emits and only after step 4's validation passed'), the entire <success_criteria> block ('Complete when... NOT complete if...'), and again restated in <validation_checklist>. The full five-subsection list (Test File, Implementation Summary, Uncovered Assertions, Conflicts and Precedence, Validation) is spelled out verbatim in step 5, <success_criteria>, and <output_format>. This pushes the file to ~15.3K chars where a leaner version would carry the same contract. Reason: Per pa-hardening DRY and Bloat Control, repeating the same contract three times grows surface area and risks the copies drifting apart on future edits; a single canonical statement with references is more reliable. Solution: Keep the full done-condition only in <success_criteria> (its canonical home) and have process step 5 plus <output_format> reference it by name rather than restating 'Complete when / NOT complete'. State the five required subsections once (in the template/output_format) and reference 'the five required subsections' elsewhere instead of re-listing them. Do not change behavior, only remove the duplicated restatements.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/aqa-test-authoring/references/test-implementation-template.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/aqa-test-debugging/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Cognitive Budget	Problem: The always-loaded SKILL.md is ~16K chars (10K-20K high-severity band per evaluation rules). It carries Part A steps 1-9, the full canonical taxonomy, <input_contract> table, <safety_boundaries> (including verbatim redaction examples), <success_criteria>, a full <validation_checklist> duplicating success criteria item-by-item, and that restate the same rules a third time. A Part-A-only caller pays this entire surface even though Part B detail was already deferred to references/part-b-mechanics.md. Reason: Per SHARED_CONTEXT, an oversized always-loaded prompt risks forcing overflow/compaction and pa-hardening targets prompt size; progressive disclosure already exists for Part B mechanics but the Part-B safety/validation text was not deferred with it. Solution: Move the Part-B-specific halves of <safety_boundaries> (approval discipline, test-code-only writes), the Part-B <validation_checklist> block, and the Part-B lines into references/part-b-mechanics.md alongside the mechanics already there, leaving only a one-line pointer in SKILL.md. This keeps the read-only Part-A surface lean and defers write-path safety detail to the file already loaded when Part B runs.
🔵 Medium	Bloat Control	Problem: <success_criteria>, <validation_checklist>, <safety_boundaries>, and restate the same rules multiple times. Example: the no-inferred-approval rule appears in <safety_boundaries> ('Inferred approval from prose ... is forbidden'), in the Part-B <validation_checklist> ('no inferred approval'), and again in ('inferred approval from looks good / silence is forbidden'). The application-source rule and the silent-skip-page-source rule are similarly triplicated. Reason: pa-hardening flags redundancy and 'compressible without value loss'; the triplication inflates the already-large SKILL.md without adding new behavior. Solution: Keep the canonical statement in <safety_boundaries>/<failure_handling> and reduce <validation_checklist> and to terse pointers (e.g. 'approval explicit per <safety_boundaries>') instead of re-stating the full rule, trimming the always-loaded surface.
🔵 Medium	Single Responsibility	Problem: The skill bundles two responsibilities with different risk profiles: Part A (read-only report analysis, steps 1-6) and Part B (writes test source files + lint + iteration tracking, steps 7-9). The prompt itself acknowledges this in <when_to_use_skill>: 'The skill bundles two responsibilities with materially different risk profiles' and notes the split is preserved 'so future SRP tightening (extracting Part B to a sibling skill) is a one-step refactor.' Reason: pa-hardening enforces SRP (1-2 responsibilities); read-only analysis and repository-mutating correction are two distinct jobs. The explicit boundary and Part-A-only path mitigate the risk, so this is a documented compromise rather than a hidden flaw. Solution: Acceptable as shipped because the boundary is explicit and a Part-A-only caller is told not to run steps 7-9. To fully satisfy SRP, extract Part B into a sibling skill (e.g. aqa-test-correction) that consumes Part A's artifact, as the prompt's own note anticipates.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	3	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	3	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/aqa-test-debugging/references/escalation-template.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/aqa-test-debugging/references/part-b-mechanics.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/automation-test-execution-analysis/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Cognitive Budget	Problem: SKILL.md is ~15.7K chars (10K-20K high-severity band). The <safety_boundaries> redaction policy alone spans a full multi-row table PLUS a separate canonical grep-pattern list PLUS a structural-content rule PLUS a re-scan rule, and this skill has no references/ file to defer detail to (progressive disclosure not used). The whole redaction policy is loaded on every invocation even though it only fires when inputs embed secrets. Reason: Per SHARED_CONTEXT and pa-hardening, an oversized always-loaded skill risks compaction; the redaction detail is reference-grade material that fits progressive disclosure, which this skill does not yet use. Solution: Extract the <safety_boundaries> redaction table + grep-pattern list + structural-content rule into a references/redaction-policy.md loaded on demand, leaving a one-line trigger in SKILL.md. The prompt's own 'DRY note (future)' already anticipates a shared redaction reference; pulling it out also shrinks the always-loaded surface.
🔵 Medium	Bloat Control	Problem: Non-operational provenance note in <safety_boundaries>: the '> DRY note (future): the redaction policy ... is shared verbatim with sibling skills (aqa-test-debugging, qa-test-debugging). A single sensitive-data redaction reference would let all three skills source from one canonical location - tracked in docs/TODO.md for the next family refactor.' This is a future-plan / rationale annotation aimed at maintainers, not an instruction the executing agent acts on, and it also introduces sibling-skill awareness (names aqa-test-debugging and qa-test-debugging). Reason: pa-patterns ai-issues warns against inserting non-operational clarifications (history, rationale, future plans) into target prompts, and pa-hardening requires no lateral/sibling awareness; the note violates both without changing agent behavior. Solution: Remove the 'DRY note (future)' block from the prompt and track the refactor in docs/TODO.md only (where it already says it is tracked). Keep the skill source-agnostic and free of sibling-name coupling.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	3	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/automation-test-implementation-handoff/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Cognitive Budget	Problem: SKILL.md is ~15K chars (10K-20K high-severity band) with no references/ deferral (no progressive disclosure). The step-4 domain-skill GATE rule is stated in full in <core_concepts>, again in the <input_contract> row, again at length in step 4 of , again in <failure_handling> (three separate bullets), again in <validation_checklist>, and again in — the same canonical rule re-expanded across six sections, all loaded every invocation. Reason: Per SHARED_CONTEXT, oversized always-loaded skills risk compaction; pa-hardening targets prompt size and pa-patterns flags redundancy. The rule is repeated rather than referenced, inflating cognitive load without new behavior. Solution: State the step-4 GATE rule canonically once (in step 4 as the prompt already designates 'canonical'), and reduce the other five sites to terse pointers ('domain skill required — see step 4 GATE') rather than re-expanding the rule. Optionally move the <recommended_foundational_skills> rationale paragraph to a reference. This shrinks the always-loaded surface below the high-severity band.
🔵 Medium	Bloat Control	Problem: The 'Why this skill doesn't ACQUIRE/USE' paragraph in <recommended_foundational_skills> ('A skill that chains four-plus sibling skills behaves as a workflow phase, not a leaf skill, and couples to sibling names + load order ... By recasting these as recommended foundational skills ...') is design-rationale explaining a past authoring decision, not an instruction the executing agent acts on. Reason: pa-patterns ai-issues warns against injecting rationale/origin/explanatory meta-notes into target prompts (state-only, action-only); the paragraph adds no runtime behavior and enlarges an already high-band file. Solution: Remove the rationale paragraph; keep only the operative rule ('this skill verifies presence and applies discipline; it does NOT ACQUIRE/USE') already stated in <core_concepts>. Move the design reasoning to the change-log or PR description.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	3	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/confluence-source-harvesting/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Success Criteria	Problem: The skill has no explicit <success_criteria> section stating testable done-when conditions. The sibling skill mcp-confluence-data-collection defines a dedicated <success_criteria> block ('Complete when target pages were retrieved... OR the failure path was followed'), but confluence-source-harvesting only has a <validation_checklist> that lists post-conditions to verify rather than a single completion statement. An agent reading this skill must infer when the harvest is 'done' from step 10 plus the checklist. Reason: Without an explicit completion statement, the agent may treat a partial harvest (e.g. parents fetched, children skipped) as done, which the validation_checklist forbids but does not gate at the right moment. Solution: Add a <success_criteria> section mirroring the sibling skill: state the skill is complete when all user URLs/derived pages were fetched and embedded as page entries, children were checked or waived, truncation/redaction applied, the step 10 summary was written, OR a <failure_handling> stop path was followed; and that it is NOT complete on a silent zero-page emit.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/confluence-source-harvesting/references/redaction-and-normalization.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/gap-and-contradiction-analysis/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Conflict Resolution	Problem: The SKILL's <safety_boundaries> rule 3 declares the three-tier risk scheme High/Medium/Low the single source of truth and explicitly forbids introducing 'Critical/Urgent/Blocker as a fourth tier'. But the output document skeleton it points to (referenced from <output_format>) defines an Executive Summary field 'Severity: [Critical / High / Medium / Low]', introducing exactly the Critical tier the SKILL prohibits. The SKILL and its own referenced template contradict each other on the allowed tier vocabulary. Reason: An agent following rule 3 will refuse to write 'Critical' while the template instructs it to, producing inconsistent documents across runs and a contradiction the validation_checklist ('single risk tier from <risk_assessment>') cannot resolve. Solution: Align the two: either remove 'Critical' from the Executive Summary Severity field in entry-templates-and-document-skeleton.md so it reads '[High / Medium / Low]', or add an explicit note in <safety_boundaries> rule 3 that the Executive-Summary Severity field is a separate overall-document rating distinct from per-finding risk tiers and may use Critical. Pick one and state it in the SKILL so the vocabulary is unambiguous.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/gap-and-contradiction-analysis/references/entry-templates-and-document-skeleton.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Conflict Resolution	Problem: The Output Document Skeleton's Executive Summary defines 'Severity: [Critical / High / Medium / Low]', which includes a 'Critical' value the parent SKILL.md's <safety_boundaries> rule 3 explicitly forbids ('Do not introduce Critical/Urgent/Blocker as a fourth tier'). As a detail layer of the parent skill, this template contradicts the parent's authoritative tier rule. Reason: The reference is loaded at document-assembly time; if it instructs the agent to emit 'Critical' while the parent forbids it, the agent gets contradictory write-time guidance and document tier vocabulary becomes non-deterministic across runs. Solution: Remove 'Critical' from the Severity field so it reads '[High / Medium / Low]' to match the parent SKILL's three-tier rule, OR add an inline note in this skeleton clarifying the Executive-Summary Severity is a whole-document rollup distinct from the per-finding High/Medium/Low tiers (matching whichever resolution the parent SKILL adopts).

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/mcp-confluence-data-collection/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/mcp-confluence-data-collection/references/cql-and-redaction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/mcp-confluence-data-collection/references/vendor-swap.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/mcp-jira-data-collection/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/mcp-jira-data-collection/references/vendor-swap.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/mcp-testrail-data-collection/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Safety Boundaries	Problem: The redaction target list in mcp-testrail-data-collection `<safety_boundaries>` is weaker than its Jira sibling. It names the credential and PII categories but gives no concrete grep patterns (no `Bearer` , `Authorization:`, JWT `eyJ...`, `BEGIN PRIVATE KEY`, email/phone/card regex shapes), and omits database connection strings entirely. The `<validation_checklist>` then says 'grepped ... per `<safety_boundaries>`' but `<safety_boundaries>` provides nothing to grep. Reason: Without concrete patterns the agent decides ad hoc what looks like a secret, so embedded tokens or PII in step text or preconditions can pass the scan and land in a tracked artifact. Solution: Add the same concrete grep-pattern set and the database-connection-string category to the testrail `<safety_boundaries>` so the redaction scan is operational, matching the depth of the Jira skill that the parallel chain re-emits into version control.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/qa-data-collection/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Cognitive Budget	Problem: At ~16.8K chars this SKILL.md sits in the 10K-20K high-severity band. It is the orchestrator skill yet it carries seven full process steps, a long output_format template, pitfalls, safety_boundaries, success_criteria, failure_handling, and a validation_checklist all always-loaded. Step 5 (Discover Existing Test Patterns) restates many enumerations (frameworks, HTTP clients, directory globs) that overlap with the deferred backend-source-analysis reference, adding to the always-loaded budget. Reason: A large always-loaded orchestrator skill that itself delegates to MCP sub-skills consumes context the parent workflow also needs; the bigger it is, the higher the risk of skipped steps and compaction during the multi-skill collection chain. Solution: Push the detailed step-5 enumerations (framework/import/HTTP-client/test-structure lists) into a deferred reference the same way step 4 already defers to references/backend-source-analysis.md, keeping step 5 as a thin orchestration entry. This trims the always-loaded surface back toward the lean-SKILL target.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/qa-data-collection/references/backend-source-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/qa-gap-analysis/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/qa-project-config/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/qa-test-debugging/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Cognitive Budget	Problem: The file is ~18.3K characters, the largest of the QA skills and within the 10K-20K high-severity band flagged in the shared context. The size comes largely from carrying two full responsibilities (Part A analysis + Part B corrections) plus four heavy governance sections (<safety_boundaries>, <failure_handling>, <success_criteria>, <validation_checklist>) where <success_criteria> and <validation_checklist> restate the same Part A/Part B contract (e.g., 'every <output_format> section present', 'no literal credential/PII', 'iteration 3 escalation recorded' appear in both). A single skill load this large competes for context budget against the rest of the QA chain artifacts the agent must also hold. Reason: At 18.3K a single on-demand skill consumes a large share of the working context window; the overlap between success_criteria and validation_checklist is non-functional repetition that can be compressed without losing any check, lowering load cost for every invocation. Solution: Reduce duplication between <success_criteria> and <validation_checklist>: keep <success_criteria> as the high-level done-condition and have it reference <validation_checklist> for item-level checks (the file already declares <validation_checklist> as 'single source of truth' but still restates the items in <success_criteria>). If the Single Responsibility split is adopted, the per-skill size drops naturally below the high band. No behavioral change is required, only de-duplication of the two governance sections.
🟡 High	Single Responsibility	Problem: The skill bundles two responsibilities with materially different risk profiles into one file: Part A (steps 1-5) is read-only report analysis producing execution-report.md, and Part B (steps 6-8) writes test source files and runs lint after user approval. The <when_to_use_skill> section itself states 'The skill bundles two responsibilities with materially different risk profiles' and has to add a 'Part A / Part B usage boundary' note plus a rule that 'a Part-A-only invocation MUST NOT execute steps 6-8' to manage the coupling. A read-only analyzer and a code-mutating fixer are two distinct jobs (the schema target is 1-2 related responsibilities, here the second one mutates source and carries approval/lint/iteration machinery the first does not). Reason: One skill doing both read-only analysis and approval-gated source mutation enlarges the cognitive search space and risks an agent sliding from analysis into applying changes without the explicit approval gate; the prompt already needs guardrail prose to prevent exactly that, which signals the responsibilities are separable. Solution: Consider splitting into two skills: a read-only qa-report-analysis skill (current Part A, steps 1-5, producing execution-report.md) and a separate qa-test-correction skill (current Part B, steps 6-8, consuming execution-report.md as its input contract and owning the approval/lint/3-iteration policy). If the QA family intentionally keeps them together for chaining, keep the current explicit Part A/Part B boundary note but make the split-skill option an explicit recorded decision so the coupling is deliberate rather than incidental.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	3	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	3	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/qa-test-implementation/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/qa-test-implementation/references/multi-language-examples.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/repository-implementation-standards/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/requirements-synthesis/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/requirements-synthesis/references/output-schemas.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/sequential-workflow-execution/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/swagger-contracts-analysis/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Cognitive Budget	Problem: The SKILL.md is ~16KB. The (5 numbered sections, each with multi-level sub-bullets), <output_format> (full per-endpoint template), <validation_checklist> (8 proof items), <success_criteria>, <failure_handling> (7 distinct branches including GraphQL adaptation), and all stay loaded together. An agent executing a single endpoint extraction carries the GraphQL branch, the reconciliation-conflict branch, and the citation-source-unavailable branch in context even when none apply, exceeding the reliable ~5-decision-at-once budget per pa-patterns.md ai-issues. Reason: Progressive disclosure keeps the always-loaded surface lean and within the cognitive budget; the skill already proves it can defer detail (redaction-catalog.md), so the rarely-hit failure branches are the natural next deferral. Solution: Move the lower-frequency <failure_handling> branches (GraphQL API adaptation, spec-vs-code reconciliation-conflict-beyond-Notes, citation-source-unavailable) into a references/ file loaded on demand, mirroring the redaction-catalog lazy-loading the skill already uses; keep only the common locate/coverage/parse-failure stops inline in the base SKILL.md.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/swagger-contracts-analysis/references/canonical-example.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/swagger-contracts-analysis/references/redaction-catalog.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/testrail-test-case-authoring/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Cognitive Budget	Problem: The SKILL.md is ~16KB. Three full worked (Happy Path, Negative-with-parameterized, Role-based-merged) plus a complete <test_case_template>, an 8-item <validation_checklist>, a 5-branch <failure_handling>, a <safety_boundaries> redaction catalog inline, an <epistemic_honesty> per-field gap-marker list, and are all loaded together to author a single test case. The example bodies and the full safety catalog stay in context for every authoring call, pushing past the reliable ~5-decision budget noted in pa-patterns.md ai-issues (overload causes skipped steps). Reason: Worked examples and the verbatim redaction catalog are detail layers consulted only when a field-shape question or a redaction arises; loading them on every authoring call inflates the always-resident surface area without adding per-call value. Solution: Defer the three full blocks and the inline redaction target/placeholder catalog in <safety_boundaries> to a references/ file loaded on demand, keeping the template, format_rules, success_criteria, and the gap-marker rules inline. The sibling swagger skill already shows this lazy-loading pattern (references/redaction-catalog.md, references/canonical-example.md).

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/testrail-test-case-export/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Cognitive Budget	Problem: The SKILL.md is ~16KB. The full <vendor_replacement> section (~25 lines describing how to fork the skill for Zephyr/Xray/qTest/Polarion, the per-vendor re-binding catalog, and the workflow-side coupling note) stays loaded during every actual TestRail export run even though it governs a future authoring task, not the runtime export. Combined with , <input_contract>, <safety_boundaries>, <validation_checklist>, and , this carries non-execution maintenance guidance in the execution context. Reason: Vendor-porting guidance is consumed by a prompt-maintainer task, not by the export-runtime agent; keeping it inline mixes a maintenance concern into the runtime cognitive budget with no per-export value, per pa-patterns.md work-curiosity-limit and progressive disclosure. Solution: Move <vendor_replacement> to a references/ file (e.g. references/vendor-porting.md) loaded only when someone is forking the skill to a new TMS, leaving the runtime export path (process, input_contract, safety_boundaries, validation_checklist) as the resident surface.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/user-approved-code-changes/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/aqa-flow-code-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better

📄 `instructions/r3/core/workflows/aqa-flow-data-collection.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/aqa-flow-requirements-clarification.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Example Grounding	Problem: The diff deletes the concrete `Example Questions` block (critical/edge/test-flow sample questions such as 'should we match exact text "Success!" or just verify message contains "Success"') and the inline `Define Explicit Assertions` task with per-step assertion examples. The new file gives typed assertion patterns (Presence/State/Content/Behavioral templates) but no worked sample question and relies on bound skills for question content. Reason: The deleted sample questions showed the agent the desired specificity (exact-match vs contains); without an equivalent example downstream, question quality may regress to vague prompts. Solution: Verify `questioning` and `aqa-requirements-elicitation` carry sample questions and a worked typed-assertion example; if not, add one short example question and one filled assertion bullet to those skills (not the workflow) so the abstract templates have a concrete anchor.
⚪ Low	Conflict Resolution	Problem: The new `<description_and_purpose>` and `<workflow_context>` assert assertion derivation happens 'inside the bound skill at step 2.1' while step 2.4 transcribes; but `<identify_gaps step=2.1>` only says 'USE SKILL aqa-requirements-elicitation' and 'Prepare a list of unknowns', not that it derives typed assertions. The authority-chain claim and the step body are slightly out of sync. Reason: If step 2.1 does not visibly produce derived assertions, step 2.4's 'collect every derived assertion' has no clearly defined source within the phase text. Solution: Add an explicit bullet to `<identify_gaps>` stating the skill also derives the typed `Derived assertion` field per item, so step 2.1 and the authority-chain narrative agree.
⚪ Low	Decision Branching	Problem: The base `Task 4` had explicit `DO NOT PROCEED to Phase 3 until answers received`. The new `<wait_for_user>` keeps STOP AND WAIT but the only else-branch (zero derived assertions) is handled in step 2.4; there is no explicit branch for 'user provides partial answers' or 'user declines to answer'. Reason: Partial-answer handling is a realistic HITL path; leaving it implicit risks the agent silently proceeding or stalling. Solution: Add a one-line else-branch in `<wait_for_user>` or `<update_test_plan>` for partial/declined answers (record gap, proceed with documented unknowns vs re-ask), matching the None-clause pattern already used for assertions.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	3	⬇️ Slightly worse
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-selector-identification.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Example Grounding	Problem: The diff deletes the entire detailed user-facing page-source capture protocol: the step-by-step F12 / right-click Inspect / Copy outerHTML / include 2-3 parent levels instructions, the `{page-name}.html` kebab-case naming convention, and the full `User Interaction Format` message template. The new `<handle_page_source>` reduces all of this to 'Provide clear instructions to user for capturing HTML'. Reason: This is user-facing instruction content; 'provide clear instructions' is non-operational and gives the agent no template, so the user-facing capture message quality regresses and non-technical users may not capture usable HTML. Solution: Confirm `aqa-selector-management` (Part A) owns this user-facing capture protocol; if it does not carry the HTML-capture steps and naming convention, restore a compact version (or an explicit pointer) in `<handle_page_source>` since this is user-facing output that compression rules protect.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	2	⬇️ Slightly worse
Failure Handling	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better

📄 `instructions/r3/core/workflows/aqa-flow-selector-implementation.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Output Contract	Problem: The diff deletes the Phase 5 test-plan output template (Page Objects Modified/Created with added selectors and methods, Implementation Notes, Files Modified) and the code-pattern examples (TypeScript selector/getter/new-page-object skeletons). The new file documents only state-file fields in `<update_state>` and a checklist; the implementation artifact shape now depends on `aqa-selector-management` Part B. Reason: The concrete getter/accessor code examples grounded the 'follow existing patterns exactly' instruction; the `<skill_precedence>` positive/anti examples partly compensate but the general page-object skeleton is gone, so consistency guidance is thinner if the skill lacks it. Solution: Confirm `aqa-selector-management` Part B carries the page-object code pattern and the modified/created selector reporting shape; if not, add a minimal anchor. Do not re-inline the full TypeScript examples — they belong in the skill.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Instruction Ordering	5	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better

📄 `instructions/r3/core/workflows/aqa-flow-test-correction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-test-implementation.md`

⚠️ Issues Found

Severity	Gate	Details
🔴 Critical	Conflict Resolution	Problem: Step 6.1 says 'Do not ACQUIRE or USE coding, testing, repository-implementation-standards, or aqa-test-authoring directly from this phase file - the handoff delegates internally and is the only entry point that loads them', and sub-step 3 says 'the handoff is responsible for ACQUIRing and applying it [aqa-test-authoring]'. But the bound skill automation-test-implementation-handoff/SKILL.md states the opposite: 'This skill does NOT drive skill loading... it does NOT itself ACQUIRE/USE other skills' and its step-4 GATE STOPS with a failure if the domain/foundational skills were not already loaded by the calling workflow. The phase and the skill it delegates to give contradictory loading responsibilities. Simulation confirms this is a hard execution deadlock, not just inconsistent prose: the parent's repository-implementation-standards load does not rescue it because coding, testing, and aqa-test-authoring remain unloaded (parent lists them only as 'Recommended skills'), so the handoff still STOPS at its verify GATE and Phase 6 cannot complete. Reason: The agent following the phase will withhold loading the domain/foundational skills expecting the handoff to load them; the handoff then hits its step-4 GATE, reports 'foundational skill not loaded', and stalls Phase 6. A wrong instruction that contradicts the delegated skill reliably breaks the chain. Escalated to critical because the chain fails (hard stall), not merely degrades. Solution: Align step 6.1 with the handoff contract: instruct this phase (or the parent aqa-flow Phase 6, which already lists coding/testing/aqa-test-authoring as recommended skills) to ACQUIRE+load coding, testing, repository-implementation-standards, and aqa-test-authoring BEFORE USE SKILL of the handoff, and reword sub-step 3 so the handoff 'verifies presence and applies discipline' rather than 'is responsible for ACQUIRing'. Remove the 'only entry point that loads them' claim.
🟡 High	Reference Integrity	Problem: Step 6.1 binds aqa-test-authoring as the 'domain test implementation skill the handoff must apply' and asserts the handoff 'delegates internally and is the only entry point that loads them'. The referenced handoff skill explicitly disclaims internal delegation/loading (recommended_foundational_skills: 'this skill only verifies presence... does NOT itself ACQUIRE/USE other skills'). The reference resolves to a file, but the described behavior of that file is incorrect. Reason: A reference whose described semantics contradict the target file misleads the agent about which component performs loading, producing the same chain stall as the Conflict Resolution issue. Solution: Update the prose so the reference matches the handoff's actual contract: the parent workflow loads the foundational + domain skills, the phase names aqa-test-authoring as the domain skill, and the handoff verifies-and-applies. Cite the handoff's recommended_foundational_skills section instead of claiming it loads skills.
🔵 Medium	Example Grounding	Problem: The rewrite deleted all concrete code examples that the base provided (TypeScript test scaffold, setup, assertions like expect(welcomeMessage).toBeVisible(), cleanup hooks, state-file template). The new phase has no canonical example of the implemented artifact it governs. Reason: Examples grounded the previous implementation step; their removal is a deletion. The loss is partly mitigated by delegation, but the phase itself now offers no concrete anchor, so the comparison is slightly worse. Solution: Since authoring detail is now delegated to aqa-test-authoring, this is acceptable IF the phase notes that concrete authoring examples live in aqa-test-authoring's output_format. Add one short canonical example of the expected state-file update or validation_checklist outcome to keep the phase self-grounding.
🔵 Medium	Decision Branching	Problem: The rewrite collapsed the prior explicit branching (new-file vs existing-file in old Task 2, cleanup-needed vs not in old Task 8) into a single 'Execute test authoring' delegation. Within this phase the only remaining branch is the skill_handoff acceptable/unacceptable check; authoring decision points now live only inside the delegated skill, so the phase no longer states the if/then for the common authoring forks. Reason: Without a pointer, an agent reading only this phase cannot tell whether the missing branches are intentionally delegated or accidentally dropped, risking skipped decisions. Solution: Either add a one-line note that file-location and cleanup branching are owned by aqa-test-authoring (so the reader knows where the decision logic moved), or keep a brief if/then pointer. No need to restore the full task list.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	2	⬇️ Slightly worse
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	3	⬇️ Slightly worse
Structural Coherence	5	⬆️ Slightly better
Example Grounding	3	⬇️ Slightly worse
Safety Boundaries	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-test-report-analysis.md`

⚠️ Issues Found

Severity	Gate	Details
⚪ Low	Example Grounding	Problem: The rewrite removed the base's concrete failure-record markdown template (Error Type / Error Message / Stack Trace / Page Source Analysis fields) and Phase 7 plan section template. The new file defers the artifact schema to the domain skill's output_format and gives no inline example of a labeled root-cause entry. Reason: The evidence-strength labeling rule is the central new behavior; one concrete labeled example would reduce mislabeling risk. Low severity because the Confirmed/Assumption/Unknown definitions and tie-break are spelled out in prose. Solution: Add one short inline example of a labeled root cause (e.g. a 'Confirmed' line with a one-line rationale) or explicitly point to aqa-test-debugging output_format for the artifact shape. Schema delegation is fine; a single anchor example keeps the evidence-label rule concrete.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	✅ Much better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/adhoc-flow.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow-test-report-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow-api-spec-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow-data-collection.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow-documentation-mcp-subflow.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Bloat Control	Problem: The subflow uses heavy cross-referencing indirection: the <execute_documentation_mcp> intro states 'branch triggers live in <output_contract> and are referenced by name', 'Config-key precedence lives in <workflow_context> and is referenced, not relisted', and the early-exit rule plus <verify_remediation> all point back to <output_contract> branch names (SKIPPED_NO_CONFIG, ACQUIRE_FAILED, EMPTY_HARVEST, COMPLETED). This DRY-by-reference style avoids duplication but forces the agent to jump between four sections to resolve a single branch, adding cognitive overhead for what is one optional collection branch. Reason: Per pa-patterns ai-issues, agents skip steps and lose context when forced to resolve directives across multiple distant sections; co-locating the resolved value reduces round-trips and skipped-branch risk. Solution: Inline the one-line outcome string next to each branch trigger in <harvest_and_collect> and (e.g. after 'apply SKIPPED_NO_CONFIG' append the literal outcome line), keeping <output_contract> as the canonical table but removing the need to cross-jump on every branch. Reduces lookups without reintroducing the full duplication the author was avoiding.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	3	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow-execution-and-report-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow-gap-and-requirements-clarification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow-project-config-loading.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow-test-case-specification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow-test-correction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/qa-flow-test-implementation.md`

⚠️ Issues Found

Severity	Gate	Details
🔴 Critical	Conflict Resolution	Problem: Step 5.1 forbids the phase from loading the foundational/domain skills ('do not USE SKILL or ACQUIRE coding, testing, repository-implementation-standards, or qa-test-implementation directly from this phase file - the handoff delegates to them internally and is the only entry point that loads them'), but the bound skill automation-test-implementation-handoff/SKILL.md states the opposite: it 'does NOT drive skill loading' and STOPS if those skills are not already loaded by the calling workflow. Parent qa-flow.md Phase 5 lists them only as non-binding 'Recommended skills'. So nothing actually loads them and the handoff stops at its first verify GATE - Phase 5 deadlocks every run. Reason: Tracing the chain shows the phase forbids exactly the skills the bound skill gates on, so the documented happy path cannot complete. Reliability is the primary goal; a broken execution chain is critical. Solution: Make the parent qa-flow.md Phase 5 (or this phase file) ACQUIRE+load coding, testing, repository-implementation-standards, and qa-test-implementation as a BINDING load before USE SKILL of the handoff; reword step 5.1 so the handoff 'verifies presence and applies discipline' rather than being 'responsible for ACQUIRing' or 'the only entry point that loads them'. Mirror the consistent pattern used by automation-test-execution-analysis (which correctly drives loading).
🟠 Very High	Reference Integrity	Problem: Step 5.1 prose describes the handoff's loading behavior incorrectly ('delegates to them internally and is the only entry point that loads them'), contradicting the bound automation-test-implementation-handoff contract which verifies-but-does-not-load. The reference to the handoff's responsibility does not resolve to the skill's actual behavior. Reason: A phase that misstates what a bound skill does makes the agent rely on behavior that never happens, breaking the chain. Solution: Correct the prose to match the handoff's verify-only contract, and point Phase 5 to the binding skill-load step that must precede it.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	2	⬇️ Slightly worse
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/qa-flow.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/testgen-flow-data-collection.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Output Contract	Problem: The base file included a full verbatim `testgen-state.md` template (Phase Completion Status checklist, Metrics block, Phase Details block). The new `<update_state step="1.4">` replaced it with the single instruction 'Update testgen-state.md with Phase 1 complete and metrics' and no field list. Reason: Without an explicit state schema the agent may write inconsistent state files across phases, weakening the cross-phase self-check the parent flow relies on. Solution: List the minimum required state-file fields inline (current phase, completion checkbox, Jira fields count, Confluence pages count) or add a TERM reference to one shared state-file template defined once in the parent flow, so each phase produces a deterministic state shape.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	✅ Much better
Structural Coherence	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow-gap-and-contradiction-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow-project-config-loading.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow-question-generation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	✅ Much better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/testgen-flow-requirements-document-generation.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Output Contract	Problem: The new `<create_requirements_document step="4.3">` defers the main document structure to the `requirements-synthesis` skill ('using the output format from the skill') and only specifies the testgen-specific Executive Summary and Traceability Matrix additions. The base file carried a full inline requirements document template; that full template is now gone from this phase. Reason: If the skill's output_format does not enumerate the same sections the phase expects, the agent has no in-phase contract to verify the document is complete, risking an under-structured requirements.md. Solution: Confirm `requirements-synthesis` actually defines the full requirements section structure (US/FR/NFR/C/D/A/R bodies); if it does, keep the deferral but add one line naming the canonical section list expected so the agent can self-validate. If the skill does not define it, restore a minimal inline section list.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	✅ Much better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow-test-case-export.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	✅ Much better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow-test-case-generation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

github-actions · 2026-06-02T14:37:00Z

📋 Prompt Quality Validation Report

❌ Validation Failed

Summary by File

File	🟡 High	🔵 Medium	⚪ Low	Status
`instructions/r3/core/skills/api-test-spec-authoring/SKILL.md`	0	1	1	⚠️ Warning
`instructions/r3/core/skills/api-test-spec-authoring/references/templates-and-redaction.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/aqa-codebase-analysis/SKILL.md`	0	1	0	⚠️ Warning
`instructions/r3/core/skills/aqa-codebase-analysis/references/report-template.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/aqa-requirements-elicitation/SKILL.md`	1	1	0	❌ Fail
`instructions/r3/core/skills/aqa-selector-management/SKILL.md`	1	1	0	❌ Fail
`instructions/r3/core/skills/aqa-selector-management/references/strategy-and-template.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/aqa-test-authoring/SKILL.md`	1	1	0	❌ Fail
`instructions/r3/core/skills/aqa-test-authoring/references/test-implementation-template.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/aqa-test-debugging/SKILL.md`	1	2	0	❌ Fail
`instructions/r3/core/skills/aqa-test-debugging/references/escalation-template.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/aqa-test-debugging/references/part-b-mechanics.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/automation-test-execution-analysis/SKILL.md`	0	1	0	⚠️ Warning
`instructions/r3/core/skills/automation-test-execution-analysis/references/redaction-policy.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/automation-test-implementation-handoff/SKILL.md`	1	1	0	❌ Fail
`instructions/r3/core/workflows/aqa-flow.md`	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/aqa-flow-code-analysis.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-data-collection.md`	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/aqa-flow-requirements-clarification.md`	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/aqa-flow-selector-identification.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-selector-implementation.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-test-correction.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-test-implementation.md`	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/aqa-flow-test-report-analysis.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow-test-report-analysis.md`	0	1	0	⚠️ Warning
`instructions/r3/core/skills/qa-data-collection/SKILL.md`	0	2	0	⚠️ Warning
`instructions/r3/core/skills/qa-data-collection/references/backend-source-analysis.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/qa-data-collection/references/existing-test-patterns.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/qa-gap-analysis/SKILL.md`	0	1	0	⚠️ Warning
`instructions/r3/core/skills/qa-project-config/SKILL.md`	0	1	0	⚠️ Warning
`instructions/r3/core/skills/qa-test-debugging/SKILL.md`	1	2	0	❌ Fail
`instructions/r3/core/skills/qa-test-implementation/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/qa-test-implementation/references/multi-language-examples.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/repository-implementation-standards/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/requirements-synthesis/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/requirements-synthesis/references/output-schemas.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/sequential-workflow-execution/SKILL.md`	0	1	0	⚠️ Warning
`instructions/r3/core/skills/user-approved-code-changes/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/swagger-contracts-analysis/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/swagger-contracts-analysis/references/canonical-example.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/swagger-contracts-analysis/references/failure-handling-edge-cases.md`	0	1	0	⚠️ Warning
`instructions/r3/core/skills/swagger-contracts-analysis/references/redaction-catalog.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow.md`	1	1	0	❌ Fail
`instructions/r3/core/workflows/qa-flow-api-spec-analysis.md`	1	1	0	❌ Fail
`instructions/r3/core/workflows/qa-flow-data-collection.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-documentation-mcp-subflow.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-project-config-loading.md`	1	1	0	❌ Fail
`instructions/r3/core/workflows/qa-flow-gap-and-requirements-clarification.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-test-case-specification.md`	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/qa-flow-test-implementation.md`	1	2	0	❌ Fail
`instructions/r3/core/workflows/qa-flow-execution-and-report-analysis.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-test-correction.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/confluence-source-harvesting/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/confluence-source-harvesting/references/redaction-and-normalization.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/gap-and-contradiction-analysis/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/gap-and-contradiction-analysis/references/entry-templates-and-document-skeleton.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/mcp-confluence-data-collection/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/mcp-confluence-data-collection/references/cql-and-redaction.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/mcp-confluence-data-collection/references/vendor-swap.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/mcp-jira-data-collection/SKILL.md`	0	1	0	⚠️ Warning
`instructions/r3/core/skills/mcp-jira-data-collection/references/vendor-swap.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/mcp-testrail-data-collection/SKILL.md`	0	1	0	⚠️ Warning
`instructions/r3/core/skills/testrail-test-case-authoring/SKILL.md`	0	1	0	⚠️ Warning
`instructions/r3/core/skills/testrail-test-case-authoring/references/examples-and-redaction.md`	1	0	0	❌ Fail
`instructions/r3/core/skills/testrail-test-case-export/SKILL.md`	0	2	0	⚠️ Warning
`instructions/r3/core/skills/testrail-test-case-export/references/vendor-porting.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/adhoc-flow.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow.md`	2	2	0	❌ Fail
`instructions/r3/core/workflows/testgen-flow-data-collection.md`	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/testgen-flow-gap-and-contradiction-analysis.md`	1	1	0	❌ Fail
`instructions/r3/core/workflows/testgen-flow-project-config-loading.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-question-generation.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-requirements-document-generation.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-test-case-export.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-test-case-generation.md`	0	0	0	✅ Pass

📄 `instructions/r3/core/skills/api-test-spec-authoring/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Bloat Control	Problem: The per-value honesty / `[ASSUMED: ...]` discipline and the redaction discipline are each stated three+ times. The honesty rule appears in step 3 ('Per-value honesty rule', 'Confident fabrication is forbidden'), in `<pitfalls>` ('Confidently emitting an invented field value...'), in `<failure_handling>` (the empty-schema branch), and again in `<validation_checklist>` ('Exact-value rule' + 'Assumptions section populated'). The same redundancy occurs for the GATE conditions, which are restated almost verbatim across step 1 GATE, `<failure_handling>`, and `<validation_checklist>`. Reason: The same rule written four ways inflates the resident SKILL.md (~10.9KB) and forces the agent to reconcile near-duplicate phrasings, which is where drift and contradictions creep in. Solution: Keep the canonical statement in step 1 GATE / step 3 and have `<failure_handling>` and `<validation_checklist>` cross-reference it by name (e.g. 'per step 1 GATE' / 'per the per-value honesty rule') instead of re-expanding the full condition text. The file already uses this pattern in places — apply it consistently to the honesty and GATE conditions.
⚪ Low	Input Contract	Problem: The `<prerequisites>` block lists 'Raw test case data available', 'API endpoint contracts available', 'Gap analysis and user clarifications completed' but never states the concrete input paths/format; step 1 says only 'Read all input documents provided by the calling workflow' with no path defaults or shape, unlike the sibling aqa-codebase-analysis skill which has an explicit input table with default paths. Reason: Without an explicit input shape the agent can misread which document is the contract vs the test cases, weakening the otherwise strong GATE that depends on telling them apart. Solution: Add a short input table (input name, expected format e.g. markdown/JSON, who supplies it) mirroring the aqa-codebase-analysis `<input_contract>` table, even if all paths are workflow-supplied with no defaults.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/api-test-spec-authoring/references/templates-and-redaction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	4	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/aqa-codebase-analysis/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Bloat Control	Problem: The 'Coverage epistemic-honesty rule' and the 9-required-sections requirement are each restated across multiple blocks. The Coverage rule is defined in step 8 ('canonical — referenced from steps 3, 4, `<failure_handling>`') yet its full effect is re-expanded in steps 3, 4, 7-path note, `<failure_handling>` (two branches), `<validation_checklist>` ('Coverage section enumerates every optional input...'), and `<pitfalls>` ('Silently omitting absent optional inputs...'). 'All 9 sections required / no section blank' likewise appears in step 8, `<output_format>`, and `<validation_checklist>`. The SKILL is ~12.3KB resident. Reason: Multiple full restatements of the same two rules bloat the always-resident SKILL.md and create maintenance drift risk when one copy is edited and the others are not. Solution: Since step 8 already declares the Coverage rule canonical, replace the re-expansions in steps 3/4/7 and the pitfall with a short pointer ('apply the Coverage rule, step 8') — the steps 3 and 4 already do this for the trigger but then the rule's full text reappears elsewhere. Collapse the duplicate 9-section assertions to one canonical statement plus pointers.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/aqa-codebase-analysis/references/report-template.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/aqa-requirements-elicitation/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Reference Integrity	Problem: The skill hard-codes specific step numbers of its parent workflow phase: `<when_to_use_skill>` says "those are the parent phase's responsibility (`<ask_questions step=\\"2.2\\">` uses the `questioning` skill)"; step 6 repeats `<ask_questions step="2.2">`; the worked example references "step 2.2 of `aqa-flow-requirements-clarification`" and "step 2.4 of the clarification phase". A skill is not supposed to know which workflow/phase runs it or that phase's internal step numbering (sibling/reverse awareness). If the clarification phase renumbers 2.2/2.4, this skill silently goes stale. Reason: Embedding a sibling phase's internal step numbers couples the skill to that phase's structure; the numbers will drift out of sync and mislead the agent about where the handoff actually goes. Solution: Refer to the handoff by role/keyword only — e.g. 'the parent clarification phase's questioning step (uses the `questioning` skill)' — and drop the literal `step="2.2"` / `step 2.4` numbers. Keep the `questioning`-skill keyword as a semantic contract cue (allowed), but remove the phase-internal step identifiers.
🔵 Medium	Reference Integrity	Problem: References to `aqa-flow-code-analysis.md` `<naming_convention>` appear in `<prerequisites>` and `<failure_handling>` for resolving `<test-name>`. The naming convention is owned by a workflow phase the skill does not execute; the skill points into that phase's internal anchor for an operational term it depends on. Reason: Depending on another phase's internal anchor for a core operational term means the skill breaks if that anchor is renamed, and it assumes knowledge of a sibling the skill should not have. Solution: Either define the `<test-name>` slug rule inline in the `<input_contract>` (it is a simple filename-parse rule), or describe it as 'the `<test-name>` slug supplied/resolved by the calling workflow' without deep-linking the phase file's internal section.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/aqa-selector-management/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Single Responsibility	Problem: The skill bundles two responsibilities that the file itself says are invoked by two different phases: Part A (read-only identification, invoked by `aqa-flow-selector-identification`) and Part B (writes page-object files, invoked by `aqa-flow-selector-implementation`). The healthy SRP target is 1-2 related responsibilities; here identification and implementation have different safety profiles (read-only vs file-writing) and different invoking phases, which is why every block (`<safety_boundaries>`, `<failure_handling>`, `<validation_checklist>`, `<pitfalls>`) has to fork into Part-A-inline vs Part-B-deferred halves. Reason: Two phases with opposite write profiles in one skill raises the risk that a read-only Part A run accidentally follows a Part B write instruction; tight scope binding is the only thing preventing a safety-boundary crossover. Solution: This is acceptable IF the design rationale holds, but it should be challenged: the 'why one file' rationale (shared 4-tier taxonomy + inventory shape + handoff semantics) is real, so the merge is defensible — keep it, but make the resident SKILL.md carry ONLY Part A inline and move ALL Part B mechanics/checklist/pitfalls to the reference (already mostly done), so a Part A invocation never pays Part B cognitive cost. Verify no Part B write-path detail leaks into the always-resident SKILL.md.
🔵 Medium	Bloat Control	Problem: The Part-A-vs-Part-B scope rule and the fragile-selector discipline are each restated several times. The scope rule is declared 'canonical' in `<when_to_use_skill>` but re-expanded in `<input_contract>` ('Existence + scope validation'), `<safety_boundaries>` ('Part A / Part B scope is governed by the canonical rule... Enforcement:'), and implicitly in every forked block. The fragile-selector rule appears in step 7 (gate), `<safety_boundaries>` ('Fragile-selector discipline'), `<failure_handling>`, and `<pitfalls>`. SKILL.md is ~13.3KB — the largest of the set. Reason: At 13.3KB resident, repeated full restatements push the always-loaded portion up and invite drift between the copies; the file already adopts a 'canonical + pointer' pattern, so the duplicates are avoidable. Solution: State the Part-A/Part-B scope rule and the fragile-selector rule once as canonical (already labeled so in `<when_to_use_skill>` / step 7) and have the other blocks cross-reference them by name rather than re-expanding the enforcement text.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/aqa-selector-management/references/strategy-and-template.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/aqa-test-authoring/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Bloat Control	Problem: SKILL.md is 15KB. Much of the volume is meta-commentary about where the single source of truth lives rather than instructions: e.g. step 5 'Template load point (canonical): the verbatim template at [...] is loaded once at step 5 (the emit step) — <output_format> references this load point, not its own.' and <output_format> 'The verbatim template is loaded at the canonical load point declared in step 5 (see process step 5; not repeated here).' These two blocks restate the same DRY-bookkeeping fact in both directions. Reason: Per-call cost is paid on every invocation. The agent does not need prose explaining why a rule is stated once; it needs the rule. Trimming meta-commentary lowers cognitive load and token cost without losing any behavior. Solution: Collapse the bidirectional 'canonical load point' notes to a single one-line pointer at step 5 and drop the mirrored disclaimer in <output_format>. Remove parenthetical self-justifications ('not its own', 'not repeated here', 'mirrors the sibling pattern') that explain the DRY mechanism rather than instruct.
🔵 Medium	Cognitive Budget	Problem: The 10K-20K size band combined with dense cross-reference bookkeeping (repeated 'canonical', 'single source of truth', 'referenced here, not restated' phrases across <success_criteria>, <output_format>, <validation_checklist>, ) raises resident cognitive load for what is fundamentally a write-the-test skill. Reason: Repeated DRY-pointer phrasing competes for attention with the actual procedure and risks the agent over-weighting bookkeeping over the authoring task. Solution: Deduplicate the repo-docs-win / silent-drop pointers: state each once and let other blocks name it in <=5 words instead of re-explaining the SoT relationship each time.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/aqa-test-authoring/references/test-implementation-template.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/aqa-test-debugging/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Single Responsibility	Problem: The skill explicitly bundles two responsibilities with different risk profiles: '<when_to_use_skill>' states 'The skill bundles two responsibilities with materially different risk profiles: Part A — Report Analysis (read-only)... Part B — Corrections (writes test source files...)'. It even pre-announces the eventual split: 'The split is preserved so future SRP tightening (extracting Part B to a sibling skill) is a one-step refactor.' A read-only analyzer and an approval-gated code mutator are two skills sharing one file. Reason: Combining read-only triage with an approval-gated write path in one skill widens the blast radius: a Part-A-only caller still loads the write-path framing, and the safety boundary for Part B must be re-asserted defensively throughout. SRP separation would make the read-only contract unambiguous. Solution: Either split Part B into a sibling skill now (the file already documents this as a one-step refactor), or, if kept together for the v1, drop the speculative future-refactor sentence — it is non-operational meta-commentary that does not change agent behavior.
🔵 Medium	Cognitive Budget	Problem: The SKILL.md is large (~12-15KB, in the 10K-20K band) and is an always-loaded entry file; the repeated DRY-bookkeeping meta-commentary about where rules live adds to the resident token cost on every invocation. Reason: Smaller always-loaded entry files leave more room for task context and reduce the risk of context compaction that makes the agent unreliable. Solution: Move the heavy Part A/Part B procedural detail into the existing references/ files and keep only the dispatch logic and canonical rules in SKILL.md; declare each rule once.
🔵 Medium	Bloat Control	Problem: SKILL.md is ~13KB. A recurring meta-pattern adds volume without instruction: phrases like 'Canonical taxonomy (single source of truth)', 'Downstream sections reference this list by name', 'canonical', 'always-loaded', 'Loaded only when Part B runs' recur across nearly every block. The Part B half is a bare list of 'see references/part-b-mechanics.md#...' pointers that duplicates the reference's own pitfalls header. Reason: The repeated load-split bookkeeping is paid on every invocation and competes with the procedural steps the agent must actually execute. Solution: State the taxonomy-is-canonical and Part-A/Part-B load-split facts once near the top; let later blocks omit the repeated 'canonical/always-loaded' qualifiers. Replace the Part B stub list with a single line pointing to part-b-mechanics.md pitfalls.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/aqa-test-debugging/references/escalation-template.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/aqa-test-debugging/references/part-b-mechanics.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/automation-test-execution-analysis/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Bloat Control	Problem: SKILL.md is ~12.6KB. The domain-skill-contract concept is restated three times in near-identical wording: <core_concepts> ('This skill orchestrates around the domain skill's read-only output contract... without knowledge of the domain skill's internal structure'), <input_contract> row ('invoked under its analysis-only / read-only output contract — its job here is to emit the categorized analysis artifact, not to mutate source'), and process step 6 ('USE the resolved domain analysis skill under its analysis-only / read-only output contract — it MUST emit the categorized analysis artifact and MUST NOT mutate source files'). The same MUST/MUST-NOT pair appears each time. Reason: Triple restatement of the same contract inflates the always-loaded body and dilutes the single authoritative statement the agent should anchor to. Solution: State the read-only domain-skill contract once (it belongs in <core_concepts>) and reference it tersely at step 6 and in the input_contract row rather than re-spelling the MUST/MUST-NOT each time.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Cognitive Budget	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/automation-test-execution-analysis/references/redaction-policy.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/automation-test-implementation-handoff/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Bloat Control	Problem: The file is ~14.2K chars (in the 10K-20K high-severity bloat band per the audit spec). The canonical 'domain skill required + no silent fallback' rule is restated at least five times: core_concepts ('Canonical ... rule lives in step 4 GATE'), the `<recommended_foundational_skills>` table row, process step 4 GATE ('Silent fallback to coding + testing alone is forbidden'), `<failure_handling>` ('Domain skill name not supplied AND no conventional fallback discoverable'), and `<pitfalls>` ('Silently proceeding when the parent did not name a domain skill'). The same applies to the 'verify foundational skill is loaded, this skill does NOT load it' statement, which appears in core_concepts, the table preamble, every process step 1-4, and `<failure_handling>`. Reason: The same constraint repeated five ways inflates the prompt without adding meaning and increases the chance the agent skips a step under load; one authoritative statement plus pointers is more reliable and far smaller. Solution: State the 'no silent fallback' rule once at step 4 GATE and replace the other four occurrences with a bare cross-reference (e.g. 'see step 4 GATE'). State the 'verify-don't-load' contract once (already in core_concepts) and drop the repeated 'this skill does NOT load it; the calling workflow recommends + loads it' clause from steps 1-4 and the table.
🔵 Medium	Cognitive Budget	Problem: Five overlapping list sections carry near-identical content: `<recommended_foundational_skills>` table, `<process>` steps 1-4, `<failure_handling>`, `<validation_checklist>`, and `<pitfalls>` all enumerate the same foundational-skill and domain-skill verification logic. An agent must hold all five in working memory to act on step 4. Reason: Fewer non-overlapping sections reduce the cognitive search space and the risk the agent reconciles contradictory-looking duplicates. Solution: Collapse `<pitfalls>` into `<failure_handling>` (they restate the same failure modes) and trim `<validation_checklist>` to outcomes not already implied by `<process>` GATEs.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/aqa-flow.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Conflict Resolution	Problem: The `<orchestration_and_escalation>` 'Verification-failure unilateral-start override' instructs the agent to start the earliest incomplete phase in the same turn and explicitly 'do NOT call AskUserQuestion, present options, or ask how do you want to proceed'. This is an auto-proceed decision embedded in a workflow, which sits in tension with the file's own NO-ASSUMPTIONS rule and the principle that HITL/user-involvement defaults live in the `hitl` skill / `bootstrap-hitl-questioning`. The override is well-guarded (3-part precondition, ambiguity-defaults-to-ASK fallback, scoped to one gate, cites the `hitl` skill as authority), so the risk is contained, but the workflow still hardcodes a no-ask branch. Reason: Embedding a 'do not ask the user' branch in a workflow can be over-applied by an agent under load; an explicit one-line scope-lock keeps the narrow exception from leaking into other decisions. Solution: Keep the gate but add one explicit deference line stating that this single override is the only sanctioned deviation from the `hitl` skill defaults and applies only when all three preconditions hold, so a reader cannot generalize the no-ask behavior to other branches.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	✅ Much better
Cognitive Budget	4	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/aqa-flow-code-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/aqa-flow-data-collection.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Structural Coherence	Problem: The `<workflow_context>` block mixes operational inputs/outputs with non-operational KB-taxonomy meta-notes that belong to documentation, not phase execution. The bullet 'KB catalog / ACQUIRE success: Tags above resolve to Rosetta markdown in this repository (instructions/r3/core/skills/confluence-source-harvesting/SKILL.md, instructions/r3/core/rules/bootstrap-guardrails.md). Broader taxonomy: docs/definitions/skills.md, docs/definitions/rules.md' spells out internal repository file paths and taxonomy pointers that an executing agent does not need to run Phase 1, and per pa-rosetta a phase should reference prompts by logical name (ACQUIRE tag) only, not by deep file path. Reason: Deep internal file paths and taxonomy notes are documentation, not runtime instructions; they add length and risk drift if files move, while the logical ACQUIRE tag is the agent-agnostic contract that actually matters. Solution: Keep the operational part ('Successful ACQUIRE means Rosetta returns >=1 non-empty document for the tag') and drop the explicit `instructions/...` file paths and `docs/definitions/*` taxonomy pointers from the bullet; reference the skills by their ACQUIRE tags only.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/aqa-flow-requirements-clarification.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Bloat Control	Problem: The mandatory `### Explicit Assertions` transcription rule and its 'typed Presence/State/Content/Behavioral + per-assertion granularity + None-clause' requirement are stated four times: in `<description_and_purpose>`, in `<workflow_context>` ('Assertion authority chain'), in step 2.4, and twice in `<validation_checklist>` ('Explicit Assertions subsection present...' and 'Per-assertion granularity'). The None-clause text 'None — no observable behavior derivable from current clarifications; Phase 6 will surface this as Uncovered' is reproduced verbatim three times. Reason: Repeating the same multi-clause rule and its exact fallback string four times bloats the phase and makes future edits error-prone (one copy can drift); a single authoritative copy with pointers is smaller and safer. Solution: Define the typed-assertion format and the None-clause once in step 2.4 (the operational owner) and reduce the `<description_and_purpose>`, `<workflow_context>`, and checklist mentions to a one-line pointer to step 2.4 instead of restating the full rule and verbatim None string.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/aqa-flow-selector-identification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/aqa-flow-selector-implementation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/aqa-flow-test-correction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-test-implementation.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Bloat Control	Problem: The handoff loading-responsibility contract is restated nearly verbatim across three blocks. `<workflow_context>` says the handoff "does NOT drive skill loading ... The handoff itself only verifies presence at its step-4 GATE"; `<skill_handoff>` repeats "The handoff verifies presence ... it does NOT ACQUIRE/USE them" and "its step-4 GATE STOPS ..."; then `<execute_authoring>` step 1 and the closing "User-instruction-override refusal" paragraph repeat "missing-load causes the handoff's step-4 GATE to STOP, halting Phase 6" again. The same single fact (workflow loads, handoff verifies) is asserted at least four times. Reason: Repeating the same contract four times inflates the phase and competes for attention with the actual ordered steps, making the genuinely load-bearing instructions harder to spot. Solution: Keep the contract statement once in `<skill_handoff>` and reduce the `<workflow_context>` "Loading responsibility" bullet and the step-1/refusal repetitions to a one-line back-reference (e.g. "per `<skill_handoff>`"). Do not delete the refusal rule itself, only the re-explanation of why.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	4	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	3	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-test-report-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	✅ Much better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow-test-report-analysis.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Bloat Control	Problem: The added evidence-label machinery uses heavy cross-reference scaffolding for a single concept. Task 5 bullet 3 opens with a meta-sentence ("This is the single source of truth for evidence labels — Task 3, Task 6, Completion Criteria, Update State, and Important Notes all reference this block by name ..."), and the same single-source-of-truth pointer is then echoed in Task 3, Task 6, Completion Criteria, Update State, and the `Evidence Labels` Important Note — a non-operational provenance/cross-link layer on top of the actual rule. Reason: The rule and its labels are valuable, but the repeated bookkeeping about which sections reference it is non-operational noise that grows the phase without changing agent behavior. Solution: Keep the definitions, tie-break, output rule, and undecidable fallback in Task 5. Trim the repeated "per Task 5 / single source of truth" annotations elsewhere to a bare reference (e.g. "label per Task 5") and drop the self-describing meta-sentence listing every referencing section.

📊 Gates Comparison

Gate	Score	Comparison
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/qa-data-collection/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Cognitive Budget	Problem: The SKILL.md packs an 8-step process, a full output template (~100 lines), pitfalls, safety, success_criteria, failure_handling, and validation_checklist into one always-loaded file (16.6K). Two reference files already use progressive disclosure for steps 4 and 5; the output template and the safety/validation triplet remain inline. Reason: The output template is only needed at the final write step, so deferring it keeps earlier steps lighter and reduces the chance of context overflow on the entry file. Solution: Consider moving the verbatim `<output_format>` markdown template to a referenced asset loaded on demand at step 7 (the same lazy-load pattern steps 4/5 already use), leaving a thin pointer in the SKILL body.
🔵 Medium	Bloat Control	Problem: SKILL.md is 16.6K chars. The safety/validation layer is partly duplicated across `<safety_boundaries>`, `<pitfalls>`, `<success_criteria>` step 6.1, `<failure_handling>`, and `<validation_checklist>` — e.g. the secret-scan requirement is restated in step 6.1, success_criteria, the validation_checklist 'Safety re-check' item, and the pitfalls 'Copying literal .env values' bullet. The skill itself acknowledges this ('single source of truth' is invoked repeatedly), but the cross-referencing prose adds length on every load. Reason: This is the entry-point file loaded on every invocation; trimming repeated restatements lowers per-call context cost without losing the single-source guarantee. Solution: Keep `<safety_boundaries>` as the single source and shorten the re-statements in success_criteria / validation_checklist / pitfalls to one-line pointers (e.g. 'secret-scan per <safety_boundaries>') rather than re-describing the credential list each time.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/qa-data-collection/references/backend-source-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/qa-data-collection/references/existing-test-patterns.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/qa-gap-analysis/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Bloat Control	Problem: SKILL.md is 14.7K chars with notable overlap between `<success_criteria>` and `<validation_checklist>`: the Cross-Reference-per-step requirement, Executive-Summary-counts-match-body, redaction re-scan, and assumption-fields checks each appear in both sections phrased differently (e.g. success_criteria 'Every test step ... has been cross-referenced' vs validation_checklist 'Cross-Reference entry-per-step grep'). The file flags this overlap itself ('section-presence ... enforced by <success_criteria>; this checklist verifies things <success_criteria> cannot directly assert') yet still restates the shared items. Reason: This entry file loads on every invocation; collapsing the duplicated done-conditions lowers per-call cost while keeping a single authoritative statement of each rule. Solution: Reduce the restated items to a one-line pointer so each contract is stated once, keeping only the genuinely proof-oriented grep checks unique to `<validation_checklist>`.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/qa-project-config/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Bloat Control	Problem: SKILL.md is 15.6K chars. The canonical-path constraint (`agents/qa/qa-project-config.md` project-wide, not per-IDENTIFIER) and the redaction-at-intake rule are each restated 3-4 times — appearing in `<process>` step 3/5, `<safety_boundaries>`, `<failure_handling>`, `<pitfalls>` ('Writing the project config under .../{IDENTIFIER}/...'), and `<validation_checklist>` ('Canonical paths only' + 'No literal credentials persisted'). The IDENTIFIER-consistency rule is similarly spread across step 2, failure_handling, pitfalls, and validation_checklist. Reason: Repeating the same constraint four times on an always-loaded entry file inflates per-call context without strengthening the contract. Solution: State the canonical-path rule and the redaction-at-intake rule once each in their authoritative section and replace the duplicate restatements in pitfalls/success_criteria/validation_checklist with short pointers.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/qa-test-debugging/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Single Responsibility	Problem: The skill explicitly bundles two responsibilities with 'materially different risk profiles' — Part A (read-only report analysis, steps 1-5) and Part B (writes test source files + runs lint, steps 6-8): '<when_to_use_skill>' itself states 'The skill bundles two responsibilities'. A read-only analysis capability and a code-mutating correction capability are gated together in one 17.1K skill, and the prompt has to add a 'Part A / Part B usage boundary' guard plus a 'must not be conflated' warning to keep them separate. Reason: Coupling a safe read-only path with a destructive write path in one skill forces extra guard prose and raises the risk a caller accidentally authorizes mutation when only analysis was wanted. Solution: Consider splitting into two skills (e.g. qa-test-report-analysis = Part A, qa-test-correction = Part B) so the read-only and code-mutating mandates are independently invocable; if kept together, the explicit Part-A-only guard is the right mitigation but the SRP cost remains.
🔵 Medium	Cognitive Budget	Problem: This 17.1K SKILL.md carries an 8-step two-part process, a 7-entry failure-category catalog, two embedded markdown templates (per-failure entry + `<output_format>`), pitfalls, safety, failure_handling, success_criteria, and validation_checklist all inline with no progressive disclosure, unlike its sibling qa-data-collection which offloads detail to references. Reason: Keeping all detail inline on the entry file increases the chance of context pressure and makes the >5-step process harder to execute reliably. Solution: Apply the same reference-file split the qa-data-collection skill uses (failure catalog and/or per-failure template as on-demand assets), so the entry file stays light and the heavy detail loads only when the relevant step runs.
🔵 Medium	Bloat Control	Problem: At 17.1K chars this is the largest assigned file. The failure-category catalog (7 categories, each with Symptoms/Root Cause/Action) in step 3 plus the full per-failure markdown template, the `<output_format>` template, and the `<validation_checklist>` create overlap — e.g. the safety re-scan target list appears in `<safety_boundaries>`, in step 3's inline note, in `<pitfalls>`, and in the validation_checklist 'Safety re-scan ran' item. Reason: 17K on an always-loaded entry file is a real per-invocation cost; the detailed category catalog is only needed when failures are actually being classified. Solution: Move the 7-category failure catalog (step 3) to an on-demand reference file (same progressive-disclosure pattern qa-data-collection uses), leaving a thin category list in the SKILL body; reduce the duplicated safety-scan restatements to pointers.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/qa-test-implementation/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/qa-test-implementation/references/multi-language-examples.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/repository-implementation-standards/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/requirements-synthesis/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/requirements-synthesis/references/output-schemas.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Output Contract	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/sequential-workflow-execution/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Bloat Control	Problem: The skill is ~12.9KB. The `<gate_priority>` block restates the same step-8-vs-step-10 precedence three times: once in the table (`step 8 wins`), once in the `Precedence rule` paragraph, and again in the `Reconciliation with hitl skill` paragraph, and it is restated a fourth time in `<pitfalls>` ('when in doubt, gate_priority says step 8 wins'). The same fact is also embedded in step 8 and step 10 of `<process>`. Reason: The precedence rule is correct but repeated 4-5 times, inflating context cost for every agent that loads this MUST-apply skill on every multi-phase workflow without adding new information. Solution: Collapse `<gate_priority>` to the table plus a single one-line precedence rule; drop the `Reconciliation with hitl skill` paragraph (its content is already implied by the table's 'User input role' column) and remove the redundant restatement in `<pitfalls>`.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/user-approved-code-changes/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/swagger-contracts-analysis/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/swagger-contracts-analysis/references/canonical-example.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/swagger-contracts-analysis/references/failure-handling-edge-cases.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Reference Integrity	Problem: The Spec-vs-code branch trigger references the calling workflow's internal step numbers verbatim: "the routine 'spec vs code cross-check' step (step 1.5 / step 2.4 equivalent in the calling workflow's process)". This skill reference asset hard-codes another artifact's (the workflow phase's) internal numbering, which crosses the skill/workflow isolation boundary — the skill should not know the calling workflow's step IDs. Those exact numbers do not even appear in SKILL.md (its own cross-check is step 5.1). Reason: Hard-coding a sibling artifact's step numbers breaks if the workflow renumbers, and leaks workflow internals into the skill, which is a boundary violation that makes the reference brittle. Solution: Refer to the skill's own reconciliation step by name (e.g. SKILL.md `<process>` step 5 "Reconcile and Validate") instead of citing the calling workflow's `step 1.5 / step 2.4`. Drop the cross-workflow step IDs.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/swagger-contracts-analysis/references/redaction-catalog.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/qa-flow.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Safety Boundaries	Problem: The <skip_rules> block (line ~29) states that a user instruction to bypass a gate without supplying artifacts "must be refused ... and Phase 0 still begins in the same turn", while line ~28 forbids calling AskUserQuestion. This overrides an explicit user instruction and auto-starts Phase 0 unilaterally, with no "ambiguity defaults to ASK" carve-out and no scope-lock deferring to the hitl skill — the strongest unilateral-start form in the PR. Reason: Auto-overriding an explicit user instruction is a stronger HITL deviation than the contained verification-failure case; without an ambiguity fallback, borderline phrasings get force-restarted and the agent appears to ignore the user. Solution: Add an ambiguity-defaults-to-ASK carve-out and an aqa-flow-style scope-lock sentence: when the user gives an explicit skip instruction with missing artifacts, announce-and-proceed in one line (state artifacts are missing) rather than framing it as "refusing a user instruction"; route any partial/uncertain state to the normal HITL ask path.
🔵 Medium	Bloat Control	Problem: The `<skip_rules>` block (lines 24-37) is disproportionately large for a workflow entry file (~12K chars total, in the 10K-20K signal range). The single skip-gate concept is restated many times: "MUST NOT ... call `AskUserQuestion`; present a list / menu / options block; ask the user 'how do you want to proceed', 'should I start at X', 'do you want me to', or any equivalent confirmation request; pause for input" plus a near-duplicate restatement "User instruction to bypass the gate without supplying the artifacts must be refused with the same one-line announcement and Phase 0 still begins in the same turn." The anti-confirmation rule is asserted three separate ways. Reason: A workflow entry file is loaded into context on every run; redundant restatement of one gate inflates the always-loaded budget and buries the numbered phase list that is the file's real job. Solution: Collapse the skip-gate prohibition into one MoSCoW line plus the example. Keep the (a)/(b)/(c) verification and the one-line refusal announcement; remove the duplicated 'in the same turn' / 'must be refused' restatements that repeat the same behavior.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	3	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow-api-spec-analysis.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Reference Integrity	Problem: Step 2.1 cites `RefSrc/{project-name}/docs/` twice ("from Rosetta docs at `RefSrc/{project-name}/docs/`" and "read `ARCHITECTURE.md` and `CODEMAP.md` from `RefSrc/{project-name}/docs/`"). The canonical Rosetta target-project folder is `refsrc/` (lower-case, per the standard structure). The wrong casing will not resolve on case-sensitive filesystems and is inconsistent with the rest of the Rosetta folder vocabulary. Reason: A mis-cased path silently fails to resolve on Linux, so the agent skips the architecture/codemap pre-read it was told to do, degrading the analysis. Solution: Change `RefSrc/{project-name}/docs/` to the canonical `refsrc/{project-name}/docs/` (both occurrences) to match the standard target-project folder name.
🔵 Medium	Reference Integrity	Problem: Step 2.1 deep-links into another skill's internal step numbering: "(see `qa-data-collection` skill, step 4 for full discovery logic)". A workflow phase should not depend on a sibling skill's private step numbers; if `qa-data-collection` renumbers, this pointer breaks, and it crosses the phase/skill boundary. Reason: Citing a sibling skill's internal step number couples the two artifacts and breaks silently when the skill is edited. Solution: Reference the `qa-data-collection` skill by name and the logical activity (backend-source discovery) rather than its internal `step 4`.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow-data-collection.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow-documentation-mcp-subflow.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow-project-config-loading.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Bloat Control	Problem: At 9835 chars this phase file is the second largest in the flow and directly violates the phase schema's explicit directive (`docs/schemas/phase.md`): 'the file must be small and short, skills already define how things work! Be concise! Save tokens!'. The `<config_contract>` block carries a 9-row field table AND a full illustrative markdown snippet AND a separate `<initial_data_contract>` template, while the file simultaneously admits the authoritative template 'lives in the `qa-project-config` skill'. The config field semantics are thus maintained in two places. Reason: Duplicated config templates drift apart over time; the larger file also eats cognitive budget that the schema deliberately reserves for the skill. Solution: Keep the bound-field table (downstream phases need exact key names) but drop the full illustrative `# QA Project Config` snippet, since the skill owns the canonical template and this duplicates it. Reference the skill template by name instead of reproducing a representative shape.
🔵 Medium	Cognitive Budget	Problem: The same fact (project-wide config lives at `agents/qa/qa-project-config.md` and is NOT copied per-session) is restated in `<workflow_context>` Output bullet, `<execute_config>` steps 3 and 4, `<config_contract>` intro, `<failure_handling>`, and `<validation_checklist>` — five repetitions of one invariant. Reason: Repeating one invariant five times inflates the prompt without adding new instruction, increasing the chance the agent skips a real step buried among restatements. Solution: State the canonical-path / not-per-session invariant once in `<workflow_context>` and reference it; remove the re-explanations in the contract intro and failure-handling prose.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow-gap-and-requirements-clarification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow-test-case-specification.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Precision & Explicitness	Problem: Step 4.4 full-approve branch says approval requires 'an exact approval token (per the strict-token rule shared with step 7.2)'. The 'strict-token rule' is defined in a different phase file (`qa-flow-test-correction.md` step 7.2), but no token list is stated here. Per the phase-isolation model phases do not read sibling phases, so an agent running Phase 4 standalone has no concrete token list to enforce 'exact'. Reason: A cross-phase pointer to a token list the agent cannot see at Phase-4 time makes the 'exact token' requirement unenforceable, weakening the approval gate it is meant to harden. Solution: Inline the closed token list (e.g. `approved` / `approve` / `yes`, case-insensitive) directly in step 4.4, or define it once in the parent `qa-flow.md` and reference the parent. Do not point laterally to step 7.2 as the source of the rule.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow-test-implementation.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Bloat Control	Problem: The handoff contract (calling workflow loads the 4 skills; handoff only verifies presence at its step-4 GATE; does NOT itself ACQUIRE/USE) is stated four separate times: in `<workflow_context>` ('Loading responsibility' bullet), in `<skill_handoff>` (full restatement plus acceptance criteria), in `<execute_implementation>` 'Routing' preamble, and again inside step 5.1 item 1 and item 4. This is the core redundancy in a 9724-char file that the phase schema asks to keep 'small and short'. Reason: Restating one contract four ways quadruples reading cost and risks the agent treating the variants as distinct rules; the schema explicitly reserves this detail for the skill, not the phase. Solution: Keep `<skill_handoff>` as the single authoritative statement of the handoff contract; reduce `<workflow_context>` and the `<execute_implementation>` preamble to one-line pointers ('handoff contract: see `<skill_handoff>`'). Remove the duplicated GATE explanation from step 5.1 items 1 and 4.
🔵 Medium	Cognitive Budget	Problem: The phase file (~9.7KB) restates the handoff contract four times and carries two overlapping validation lists, inflating the per-turn context for a phase that should be small per the phase schema. Reason: Leaner phase files keep the orchestration loop within the reliable step budget and lower token cost on every call. Solution: State the handoff contract once and merge the two validation lists into one MECE checklist, deferring mechanics to the referenced skill.
🔵 Medium	Structural Coherence	Problem: Phase-exit criteria are split across three overlapping blocks — `<validate>` step 5.2 ('in-progress validation items'), `<validation_checklist>` ('authoritative exit gate'), and the contract prose — with explicit cross-annotations like 'covers `<validate>` item 1'. The file itself acknowledges the divergence ('Supersedes any divergence with `<validate>` step 5.2'), signaling the two lists are not MECE. Reason: Two overlapping validation lists with cross-reference annotations force the agent to reconcile which list governs, inviting missed or double-counted checks. Solution: Collapse `<validate>` 5.2 and `<validation_checklist>` into one authoritative checklist; if an in-progress vs exit distinction is truly needed, keep 5.2 to only the items NOT in the exit checklist instead of restating overlapping items with 'covers item N' annotations.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	2	⬇️ Slightly worse
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow-execution-and-report-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow-test-correction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/confluence-source-harvesting/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/confluence-source-harvesting/references/redaction-and-normalization.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/gap-and-contradiction-analysis/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/gap-and-contradiction-analysis/references/entry-templates-and-document-skeleton.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Output Contract	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/mcp-confluence-data-collection/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/mcp-confluence-data-collection/references/cql-and-redaction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/mcp-confluence-data-collection/references/vendor-swap.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/mcp-jira-data-collection/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Bloat Control	Problem: Step 6 "Custom-field discovery fallback: see step 3 custom-fields branch (canonical) — no separate procedure." is a pure pointer step that carries no instruction; the custom-field logic is already fully stated in step 3. It exists only to be cross-referenced from `<failure_handling>` and `<validation_checklist>`, padding the numbered process. The same redaction content also appears across `<safety_boundaries>`, step 5, `<validation_checklist>`, and `<pitfalls>` (grep patterns and placeholder vocabulary partially restated), and at ~13.3KB the file sits in the 10K-20K bloat-signal band. Reason: An instruction-free numbered step and repeated pattern lists inflate the read cost of an always-loaded SKILL.md without adding behavior, raising the chance an agent skips or mis-sequences steps. Solution: Delete the empty step 6 and renumber, or fold its cross-reference into step 3's heading; keep grep patterns/placeholders only in `<safety_boundaries>` as the single source of truth and have other sections reference it by name rather than partially restating pattern lists.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/mcp-jira-data-collection/references/vendor-swap.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/mcp-testrail-data-collection/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Bloat Control	Problem: The full vendor-replacement porting guide is inlined in `<vendor_replacement>` (lines 139-156: per-vendor rebind list for Zephyr/Xray/qTest/Polarion, identifier formats, field semantics, swap pattern). The two sibling skills (`mcp-jira-data-collection`, `testrail-test-case-export`) instead push this maintainer-only material to a `references/vendor-swap.md` / `vendor-porting.md` loaded on demand, keeping the always-loaded SKILL.md lean. Here the porting guide is always loaded even though the file states it is "load only when forking, not at runtime" elsewhere in the family. The `<safety_boundaries>` redaction targets/patterns are also restated again in `<validation_checklist>` and `<pitfalls>`. Reason: Maintainer-only fork instructions are loaded into every runtime extraction, wasting context the family deliberately reserves via progressive disclosure; inconsistency with the two sibling skills also confuses future maintainers. Solution: Move the `<vendor_replacement>` body to `references/vendor-swap.md` and leave only a one-line on-demand pointer in SKILL.md, matching the sibling Jira/export skills; reference the redaction targets from `<safety_boundaries>` by name in the checklist/pitfalls instead of re-listing patterns.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/testrail-test-case-authoring/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Bloat Control	Problem: The redaction discipline is stated three times across the always-loaded SKILL.md: `<success_criteria>` ("No literal credentials / tokens / real PII appear"), `<pitfalls>` ("Pasting literal real-account passwords ... apply `<safety_boundaries>` placeholders"), and the full `<safety_boundaries>` operational block — which then ALSO points to the references catalog. The shape-preserving-placeholder sentence ("If a real production value would be the natural example, replace it with a clearly-fake placeholder of the same shape") appears in both `<safety_boundaries>` and again verbatim in the references file. Reason: Repeating the same safety rule across four blocks inflates an already 13KB always-loaded skill and risks the copies drifting out of sync on future edits. Solution: Keep the operational redaction rule once in `<safety_boundaries>`; in `<success_criteria>` and `<pitfalls>` reference it by tag name rather than restating the rule text. Remove the duplicated shape-placeholder sentence from one location.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/testrail-test-case-authoring/references/examples-and-redaction.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Reference Integrity	Problem: Line 6 deep-links a sibling skill's private internals: "Mirrors the same lazy-loading pattern the sibling `swagger-contracts-analysis` skill uses (`references/redaction-catalog.md` + `references/canonical-example.md`)." This names a sibling skill and points into its private `references/` files. Per the skill-isolation boundary (no lateral/sibling awareness, no cross-skill deep linking), a skill's own reference file must not know about or link into another skill's private content. The paths happen to exist today but the coupling is a boundary violation regardless. Reason: Cross-skill awareness couples two independently-evolving skills: if `swagger-contracts-analysis` renames or removes those reference files, this note silently rots, and it teaches the maintainer that deep-linking sibling internals is acceptable, eroding the isolation guarantee. Solution: Delete the parenthetical sibling reference on line 6. If a rationale for the lazy-loading split is wanted, state it generically ("split per progressive-disclosure best practice") without naming `swagger-contracts-analysis` or its private file paths.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	4	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/testrail-test-case-export/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Dependency Management	Problem: Numeric vendor mappings are baked directly into the runtime `<process>`: priority `priority_id: 4/3/2/1` (step 3) and type `type_id: 1/7/6/8/9/10` (step 4). These are TestRail-instance-specific magic numbers. The skill does note `<input_contract>` allows optional per-case `priority_id`/`type_id` overrides and `<pitfalls>` warns they "may differ per TestRail instance", but the defaults remain hardcoded constants rather than retrieved from project config. Reason: Hardcoded priority/type IDs silently mis-map cases when a TestRail instance uses a customized priority/type table, producing wrong-priority cases in an irreversible external write; this is the portability concern the gate targets. Solution: Keep the numeric defaults but explicitly state they are the documented TestRail-default fallback and instruct the agent to prefer the parent workflow's TMS-config mapping when supplied (the override path already exists in `<input_contract>`); reference the config source by name so the baked-in numbers are clearly the last resort.
🔵 Medium	Bloat Control	Problem: The step-7 confirmation-gate / dedup / sensitive-scan rules are stated in full in `<process>` step 7, then restated almost in full again in `<safety_boundaries>` (no-write-without-confirmation, dedup-pre-scan-every-run, redaction targets) and a third time in `<validation_checklist>` and a fourth time in `<pitfalls>`. The placeholder examples diverge across blocks — `<safety_boundaries>` uses `{valid_token}`, `{admin_token}`, `<bearer-token-for-test-user>` while the sibling authoring skill standardizes on `<valid bearer token>` shape — risking inconsistent placeholders in exported cases. Reason: Four near-duplicate copies of the destructive-write gate enlarge an always-loaded skill and can drift apart on edits; mismatched placeholder styles between author and export steps can let an inconsistent or unredacted value slip into an irreversible external write. Solution: State the gate procedure once in step 7; have `<safety_boundaries>`/`<validation_checklist>`/`<pitfalls>` reference it by step number instead of re-describing it. Align the placeholder vocabulary with the authoring skill's catalog so the same token shapes are used end-to-end.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/testrail-test-case-export/references/vendor-porting.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/adhoc-flow.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/testgen-flow.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Instruction Ordering	Problem: The verification-failure no-ask override is placed as two bare top-level bullets (lines ~22-23) in the shared <workflow_phases> preamble, directly adjacent to "USER CONFIRMATION: Wait for approval before next phase" (line ~28), with no precedence marker, no scope-lock, and no "ambiguity defaults to ASK" fallback. aqa-flow.md isolates the identical rule in a dedicated <orchestration_and_escalation> block with both guards. Reason: The same rule is safe in aqa-flow and leak-prone here purely due to placement and a missing ambiguity fallback; under context compaction an agent can carry the absolute "MUST NOT call AskUserQuestion" framing into the happy path. Solution: Mirror aqa-flow: move the two override bullets into a dedicated fenced block, add the "if any precondition is uncertain or only partially true → fall back to the normal HITL ask path; ambiguity defaults to ASK" sentence, and add one precedence line stating the per-phase USER CONFIRMATION still governs the happy path.
🟡 High	Conflict Resolution	Problem: The `<workflow_phases>` preamble contains two directly competing instructions with no stated precedence. The added anti-skip gate says: "the only correct next action is a one-line announcement ... followed by beginning the earliest incomplete phase in the same turn, without yielding to user input" and "the agent MUST NOT ... call `AskUserQuestion`; ... pause for input before starting the earliest incomplete phase." Two bullets later the same block still says "USER CONFIRMATION: Wait for approval before next phase." A reader cannot tell whether to pause for approval or to proceed same-turn, and the gate's scope (verification-failure only) is not fenced off from the general per-phase confirmation rule. Reason: Without an explicit hierarchy the agent may either skip a legitimate HITL confirmation or stall when it should resume, producing inconsistent behavior across runs. Solution: Explicitly scope the no-pause/no-AskUserQuestion gate to the verification-failure resume case only (e.g., prefix it "On verification failure ONLY:") and add one precedence line stating that normal per-phase `USER CONFIRMATION` still applies for the happy path. Keep both rules but mark which wins in which situation.
🔵 Medium	Dependency Management	Problem: New phase headers hardcode dated model identifiers, e.g. subagent_recommended_model="claude-opus-4-6, gpt-5.4-high" on phases 2,3,4. These version pins are not parameterized and will rot; a retired model id can fail or silently downgrade subagent dispatch. Reason: Baked-in model version strings become wrong within one model cycle; tier-based hints stay correct across vendors and releases, matching Rosetta's agent-agnostic principle. Solution: Replace concrete dated model ids with capability tiers (e.g. tier: complex / tier: workhorse) defined in the bootstrap, or centralize the model map in one referenced config instead of per-phase pins.
🔵 Medium	Safety Boundaries	Problem: The new gate forbids `AskUserQuestion` and any confirmation request "before starting the earliest incomplete phase" and asserts "there is nothing for the user to confirm." This is a broad suppression of HITL that, if read out of its intended narrow scope, overrides the session-wide HITL questioning policy that normally governs approval gates. Reason: Over-broad suppression of user confirmation can cause the agent to bypass required human approval, which is a safety regression in an enterprise workflow. Solution: Constrain the prohibition to the exact verification-failure branch and add an explicit carve-out that genuine HITL approval gates (Phase 3, Phase 6) and any safety/destructive confirmations are unaffected.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	3	⬇️ Slightly worse
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	3	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	3	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow-data-collection.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Reference Integrity	Problem: The rewrite generally replaced hardcoded `mcp_Jira_MCP_` calls with skill-mediated operations, but the `<common_issues>` block still contains a residual concrete tool name: "Always check for child pages using `confluence_get_page_children()` for each found page," and `<pitfalls>` references `get_page_children`. These bypass the new skill-abstraction (`confluence-source-harvesting` / `mcp-confluence-data-collection`) that the rest of the file deliberately routes through. Reason:* Mixed abstraction levels make the dependency portability inconsistent; a target project whose MCP exposes a differently named operation gets contradictory guidance. Solution: Replace the literal `confluence_get_page_children()` / `get_page_children` references in `<pitfalls>` and `<common_issues>` with the abstracted phrasing already used elsewhere (e.g., "the child-page traversal operation per `confluence-source-harvesting`"), matching the `jira_search_fields` treatment.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow-gap-and-contradiction-analysis.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Structural Coherence	Problem: The `<create_analysis_document>` section is dominated by meta-instructions about its own structure rather than the document structure. The same point is restated three times: the intro says "The fence is the complete append-only target", then "End of Pass 2 append-only block. Nothing else is appended in Pass 2. If you find yourself adding a section here, it belongs to Pass 1's skill-owned set and is misplaced.", then "### Modifiers ... These are not additional sections to append". The Pass-1/Pass-2/Modifier framing wraps a fairly small concrete output (two appended sections + a vague/specific table) in heavy self-referential scaffolding. Reason: Repeated self-referential meta-commentary inflates cognitive load and obscures the small concrete action the phase actually performs, making the instruction harder to follow reliably. Solution: Collapse the repeated "do not duplicate sections 1-6 / nothing else is appended / these are not sections" warnings into a single sentence after the fenced block, and drop the "If you find yourself adding a section here..." introspective aside.
🔵 Medium	Bloat Control	Problem: The section spends roughly half its length re-explaining the Pass 1 / Pass 2 / Modifier ownership split ("This phase does NOT duplicate that template", "The fence is a delta on top of the skill's output, NOT the whole document template", "One positive / one negative pair kept inline so the rule survives even when the skill is not loaded"). These are provenance/rationale notes about why content lives where it does, which target prompts should avoid. Reason: Non-operational rationale and redundant boundary reminders are compressible without value loss and dilute the actionable steps. Solution: Remove the rationale clauses explaining why sections are split between skill and phase; keep only the operative instruction (run the skill for sections 1-6, then append the two-section delta verbatim).

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow-project-config-loading.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow-question-generation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow-requirements-document-generation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow-test-case-export.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow-test-case-generation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

…dditional_context length limit

… of code-analysis-flow + requirements-authoring-flow; regenerate plugins

github-actions · 2026-06-03T07:37:47Z

📋 Prompt Quality Validation Report

❌ Validation Failed

Summary by File

File	🟡 High	🔵 Medium	⚪ Low	Status
`instructions/r2/core/skills/api-test-spec-authoring/SKILL.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/api-test-spec-authoring/references/templates-and-redaction.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/aqa-codebase-analysis/SKILL.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/aqa-codebase-analysis/references/report-template.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/aqa-requirements-elicitation/SKILL.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/aqa-selector-management/SKILL.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/aqa-selector-management/references/strategy-and-template.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/aqa-test-authoring/SKILL.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/aqa-test-authoring/references/test-implementation-template.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/aqa-test-debugging/SKILL.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/aqa-test-debugging/references/escalation-template.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/aqa-test-debugging/references/part-b-mechanics.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/automation-test-execution-analysis/SKILL.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/automation-test-execution-analysis/references/redaction-policy.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/automation-test-implementation-handoff/SKILL.md`	1	1	0	❌ Fail
`instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-best-practices.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-extract.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-rosetta.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/coding/SKILL.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/confluence-source-harvesting/SKILL.md`	2	0	0	❌ Fail
`instructions/r2/core/skills/confluence-source-harvesting/references/redaction-and-normalization.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/gap-and-contradiction-analysis/SKILL.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/gap-and-contradiction-analysis/references/entry-templates-and-document-skeleton.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/gitnexus-cli/SKILL.md`	0	3	0	⚠️ Warning
`instructions/r2/core/skills/gitnexus-setup/SKILL.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/gitnexus-tools/SKILL.md`	1	2	0	❌ Fail
`instructions/r2/core/skills/gitnexus-tools/assets/gn-examples.md`	0	1	0	⚠️ Warning
`instructions/r2/core/skills/init-workspace-documentation/SKILL.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/init-workspace-rules/SKILL.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/init-workspace-verification/SKILL.md`	0	1	0	⚠️ Warning
`instructions/r2/core/skills/load-context-instructions/SKILL.md`	0	1	0	⚠️ Warning
`instructions/r2/core/skills/load-context/SKILL.md`	1	0	0	❌ Fail
`instructions/r2/core/skills/load-workflow/SKILL.md`	1	5	0	❌ Fail
`instructions/r2/core/skills/mcp-confluence-data-collection/SKILL.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/mcp-confluence-data-collection/references/cql-and-redaction.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/mcp-confluence-data-collection/references/vendor-swap.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/mcp-jira-data-collection/SKILL.md`	0	1	0	⚠️ Warning
`instructions/r2/core/skills/mcp-jira-data-collection/references/vendor-swap.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/mcp-testrail-data-collection/SKILL.md`	1	0	0	❌ Fail
`instructions/r2/core/skills/mcp-testrail-data-collection/references/vendor-swap.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/operation-manager/SKILL.md`	0	1	0	⚠️ Warning
`instructions/r2/core/skills/operation-manager/assets/om-schema.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/orchestrator-contract/SKILL.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/qa-data-collection/SKILL.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/qa-data-collection/references/backend-source-analysis.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/qa-data-collection/references/existing-test-patterns.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/qa-data-collection/references/output-template.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/qa-gap-analysis/SKILL.md`	0	1	0	⚠️ Warning
`instructions/r2/core/skills/qa-project-config/SKILL.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/qa-test-debugging/SKILL.md`	0	2	0	⚠️ Warning
`instructions/r2/core/skills/qa-test-debugging/references/failure-catalog.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/qa-test-debugging/references/part-b-mechanics.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/qa-test-implementation/SKILL.md`	0	1	0	⚠️ Warning
`instructions/r2/core/skills/qa-test-implementation/references/multi-language-examples.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/repository-implementation-standards/SKILL.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/requirements-synthesis/SKILL.md`	0	1	0	⚠️ Warning
`instructions/r2/core/skills/requirements-synthesis/references/output-schemas.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/sequential-workflow-execution/SKILL.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/swagger-contracts-analysis/SKILL.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/swagger-contracts-analysis/references/canonical-example.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/swagger-contracts-analysis/references/failure-handling-edge-cases.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/swagger-contracts-analysis/references/redaction-catalog.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/testrail-test-case-authoring/SKILL.md`	0	1	0	⚠️ Warning
`instructions/r2/core/skills/testrail-test-case-authoring/references/examples-and-redaction.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/testrail-test-case-export/SKILL.md`	1	1	0	❌ Fail
`instructions/r2/core/skills/testrail-test-case-export/references/vendor-porting.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/user-approved-code-changes/SKILL.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/adhoc-flow.md`	1	2	0	❌ Fail
`instructions/r2/core/workflows/aqa-flow-code-analysis.md`	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/aqa-flow-data-collection.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow-requirements-clarification.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow-selector-identification.md`	0	1	1	⚠️ Warning
`instructions/r2/core/workflows/aqa-flow-selector-implementation.md`	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/aqa-flow-test-correction.md`	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/aqa-flow-test-implementation.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow.md`	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/aqa-flow-test-report-analysis.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/coding-agents-prompting-flow.md`	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/coding-flow.md`	0	0	1	⚠️ Warning
`instructions/r2/core/workflows/external-lib-flow.md`	0	1	1	⚠️ Warning
`instructions/r2/core/workflows/init-workspace-flow.md`	0	0	1	⚠️ Warning
`instructions/r2/core/workflows/init-workspace-flow-context.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/init-workspace-flow-discovery.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/init-workspace-flow-questions.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/init-workspace-flow-rules.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/init-workspace-flow-shells.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/modernization-flow.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-api-spec-analysis.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-data-collection.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-documentation-mcp-subflow.md`	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/qa-flow-execution-and-report-analysis.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-gap-and-requirements-clarification.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-project-config-loading.md`	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/qa-flow-test-case-specification.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow.md`	1	1	1	❌ Fail
`instructions/r2/core/workflows/qa-flow-test-implementation.md`	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/qa-flow-test-correction.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/research-flow.md`	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/self-help-flow.md`	1	1	0	❌ Fail
`instructions/r2/core/workflows/testgen-flow-data-collection.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/testgen-flow-gap-and-contradiction-analysis.md`	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/testgen-flow-project-config-loading.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/testgen-flow-question-generation.md`	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/testgen-flow-requirements-document-generation.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/testgen-flow-test-case-export.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/testgen-flow-test-case-generation.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/testgen-flow.md`	0	2	0	⚠️ Warning
`instructions/r3/core/skills/api-test-spec-authoring/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/api-test-spec-authoring/references/templates-and-redaction.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/aqa-codebase-analysis/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/aqa-codebase-analysis/references/report-template.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/aqa-requirements-elicitation/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/aqa-selector-management/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/aqa-selector-management/references/strategy-and-template.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/aqa-test-authoring/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/aqa-test-authoring/references/test-implementation-template.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/aqa-test-debugging/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/aqa-test-debugging/references/escalation-template.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/aqa-test-debugging/references/part-b-mechanics.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/automation-test-execution-analysis/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/automation-test-execution-analysis/references/redaction-policy.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/automation-test-implementation-handoff/SKILL.md`	0	1	0	⚠️ Warning
`instructions/r3/core/skills/confluence-source-harvesting/SKILL.md`	0	1	0	⚠️ Warning
`instructions/r3/core/skills/confluence-source-harvesting/references/redaction-and-normalization.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/gap-and-contradiction-analysis/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/gap-and-contradiction-analysis/references/entry-templates-and-document-skeleton.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/mcp-confluence-data-collection/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/mcp-confluence-data-collection/references/cql-and-redaction.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/mcp-confluence-data-collection/references/vendor-swap.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/mcp-jira-data-collection/SKILL.md`	0	1	0	⚠️ Warning
`instructions/r3/core/skills/mcp-jira-data-collection/references/vendor-swap.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/mcp-testrail-data-collection/SKILL.md`	1	0	0	❌ Fail
`instructions/r3/core/skills/mcp-testrail-data-collection/references/vendor-swap.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/qa-data-collection/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/qa-data-collection/references/backend-source-analysis.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/qa-data-collection/references/existing-test-patterns.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/qa-data-collection/references/output-template.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/qa-gap-analysis/SKILL.md`	0	1	0	⚠️ Warning
`instructions/r3/core/skills/qa-project-config/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/qa-test-debugging/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/qa-test-debugging/references/failure-catalog.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/qa-test-debugging/references/part-b-mechanics.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/qa-test-implementation/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/qa-test-implementation/references/multi-language-examples.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/repository-implementation-standards/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/requirements-synthesis/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/requirements-synthesis/references/output-schemas.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/sequential-workflow-execution/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/swagger-contracts-analysis/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/swagger-contracts-analysis/references/canonical-example.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/swagger-contracts-analysis/references/failure-handling-edge-cases.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/swagger-contracts-analysis/references/redaction-catalog.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/testrail-test-case-authoring/SKILL.md`	0	1	0	⚠️ Warning
`instructions/r3/core/skills/testrail-test-case-authoring/references/examples-and-redaction.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/testrail-test-case-export/SKILL.md`	0	1	0	⚠️ Warning
`instructions/r3/core/skills/testrail-test-case-export/references/vendor-porting.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/user-approved-code-changes/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/adhoc-flow.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-code-analysis.md`	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/aqa-flow-data-collection.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-requirements-clarification.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-selector-identification.md`	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/aqa-flow-selector-implementation.md`	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/aqa-flow-test-correction.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-test-implementation.md`	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/aqa-flow.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-test-report-analysis.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-api-spec-analysis.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-data-collection.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-documentation-mcp-subflow.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-execution-and-report-analysis.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-gap-and-requirements-clarification.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-project-config-loading.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-test-case-specification.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-test-correction.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-test-implementation.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-data-collection.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-gap-and-contradiction-analysis.md`	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/testgen-flow-project-config-loading.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-question-generation.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-requirements-document-generation.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-test-case-export.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-test-case-generation.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow.md`	0	1	1	⚠️ Warning

📄 `instructions/r2/core/skills/api-test-spec-authoring/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/api-test-spec-authoring/references/templates-and-redaction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	5	✅ Much better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/aqa-codebase-analysis/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/aqa-codebase-analysis/references/report-template.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	5	✅ Much better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/aqa-requirements-elicitation/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/aqa-selector-management/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/aqa-selector-management/references/strategy-and-template.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/aqa-test-authoring/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/aqa-test-authoring/references/test-implementation-template.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/aqa-test-debugging/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/aqa-test-debugging/references/escalation-template.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/aqa-test-debugging/references/part-b-mechanics.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/automation-test-execution-analysis/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/automation-test-execution-analysis/references/redaction-policy.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/automation-test-implementation-handoff/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Bloat Control	Problem: The file is 13302 chars (10K-20K band). The verify-don't-load contract is restated three times: in <core_concepts>, in the <recommended_foundational_skills> table prose, and again in each numbered process step (steps 1-4 each repeat 'Verify X is loaded ... If absent, stop per <failure_handling>'). The same 'no silent fallback to coding+testing' rule appears in step 4 GATE and in <failure_handling>. Reason: Instructions are not user-facing, so compression is expected; the triple restatement of the same contract adds tokens to every resend without adding behavior. Solution: Collapse the per-step 'Verify ... If absent stop per <failure_handling>' repetition by stating the verify-then-apply contract once and letting the table's 'Verified at' / 'If not loaded' columns carry the per-skill detail; keep only the domain-skill GATE inline since it is the high-signal one.
🔵 Medium	Conflict Resolution	Problem: Step 5 says validate that 'tests compile or parse', while <core_concepts> says 'parsing failures belongs to a later analysis phase'. A reader could read these as competing (does this phase handle parse outcomes or not?). The intended distinction (compile/parse of authored code here vs. parsing test-run reports later) is implied but not stated explicitly. Reason: Without the explicit boundary the agent may either skip the step-5 parse check or attempt report parsing it should defer. Solution: Add one clause distinguishing 'static compile/parse of the authored test code' (in scope at step 5) from 'parsing of execution reports' (later analysis phase) so the two statements are not read as contradictory.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-best-practices.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Reference Integrity	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-extract.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Reference Integrity	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Reference Integrity	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-rosetta.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Reference Integrity	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/coding/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Epistemic Honesty	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/confluence-source-harvesting/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Cognitive Budget	Problem: At 14112 chars this single SKILL.md falls in the 10K-20K range that the rubric flags as a reliability concern. Decision-time content (GATEs, failure branches, safety rules) competes for attention with the redundant success/validation restatements, raising the chance an agent skips a step in a 10-step process. Reason: Smaller decision surface improves step-following reliability, the rubric's stated primary goal; the file is large enough to trip the 10K-20K threshold. Solution: Move the redundant 'NOT complete' enumeration out (see Bloat Control fix) so the remaining decision content fits well under 10K; the per-pattern detail is already correctly offloaded to references/redaction-and-normalization.md.
🟡 High	Bloat Control	Problem: The same completion rules are stated three times: as positive done-conditions in `<success_criteria>` 'Complete when', again as the negative 'NOT complete' list (silent zero-page emit, children skipped, permission errors hidden, missing required input, redaction skipped), and a third time as line items in `<validation_checklist>`. Each of the five NOT-complete bullets restates a `<failure_handling>` branch or a `<validation_checklist>` line almost verbatim. Reason: Instructions are token-billed on every turn; triple-stating the same five conditions inflates the file to 14112 chars without adding behavior an agent does not already get from the checklist and failure-handling blocks. Solution: Drop the bulleted 'NOT complete' list in `<success_criteria>` and rely on `<validation_checklist>` plus `<failure_handling>` (which already own those checks); keep `<success_criteria>` to the single 'Complete when' sentence pointing at the checklist.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/confluence-source-harvesting/references/redaction-and-normalization.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/gap-and-contradiction-analysis/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/gap-and-contradiction-analysis/references/entry-templates-and-document-skeleton.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Output Contract	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/gitnexus-cli/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Safety Boundaries	Problem: `clean` deletes the `.gitnexus/` directory and unregisters the repo, and `wiki --gist` publishes a PUBLIC GitHub Gist of generated documentation, yet the skill states no caution or confirmation requirement for either. `--force` is documented as 'skip confirmation prompt' with no warning that data loss follows. Reason: An agent could run `clean --force` or `wiki --gist` without realizing the action is destructive or publishes content externally. Solution: Add a brief safety note that `clean` is destructive (recommend `status` first) and that `wiki --gist` makes content public (warn about leaking private-repo documentation); do not bake in a tool-specific gate, just flag the irreversible/public effects.
🔵 Medium	Success Criteria	Problem: No explicit testable done-condition for the skill. `<commands>` documents what each command does and `<when_to_use_skill>` says when, but there is no 'done when X' so an agent invoking this skill to index a repo has no completion check. Reason: Missing completion criteria means the agent may run a command and stop without verifying the intended state was reached. Solution: Add a short success line per primary action, e.g. 'analyze done when status reports index present and not stale'; reuse the existing freshness signal already named in the analyze 'When to run' note.
🔵 Medium	Output Contract	Problem: This is a CLI reference card with no `<output_contract>` or stated expected-result of running each command. After `analyze` an agent has no canonical signal of success (e.g. what `status` should then show), so it cannot confirm the index built. Acceptable for a reference card but weaker than sibling skills which all define output expectations. Reason: Without an expected result the agent cannot self-confirm a command achieved its purpose, only that it ran. Solution: Add a one-line expected-result per command (e.g. 'success: status reports a fresh index with symbol/relationship counts'); no full schema needed for a reference card.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/gitnexus-setup/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/gitnexus-tools/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Reference Integrity	Problem: The block points to `gitnexus-usage/assets/gn-examples.md`, but the asset actually lives at `gitnexus-tools/assets/gn-examples.md`. There is no `gitnexus-usage` skill in the release (only gitnexus-cli, gitnexus-setup, gitnexus-tools). Reason: A wrong asset path means the agent ACQUIRE fails or fetches nothing, so the worked examples never load when an agent needs them to pick the right tool. Solution: Change the ACQUIRE path in the block from `gitnexus-usage/assets/gn-examples.md` to `gitnexus-tools/assets/gn-examples.md` so the reference resolves to the bundled asset.
🔵 Medium	Failure Handling	Problem: No guidance for the ambiguous case beyond the `context` tool's name-collision note, and none for when `query` returns no processes or when no repo is indexed. Reason: Missing empty-result handling can leave the agent stuck or silently producing no tool call when GitNexus has no match. Solution: Add a brief fallback line: if `query` returns nothing or no repo is indexed, READ `gitnexus://repos` first, then broaden the query or fall back to standard Rosetta code search.
🔵 Medium	Success Criteria	Problem: The skill states its purpose (pick the right GitNexus tool with the right params) but gives no explicit 'done when' test for a correct selection, so there is no self-check that the chosen tool/params actually match intent. Reason: Without a testable completion marker the agent cannot verify it selected correctly, lowering reliability on the skill's only job. Solution: Add one short success line in <core_concepts>, e.g. selection is complete when the chosen tool/resource and its required parameters match the user intent and the schema was read before any cypher query.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	2	⬇️ Slightly worse
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/gitnexus-tools/assets/gn-examples.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Precision & Explicitness	Problem: Examples call tools as `gitnexus_query(...)`, `gitnexus_context(...)`, `gitnexus_impact(...)`, `gitnexus_rename(...)`, `gitnexus_detect_changes(...)`, but the parent SKILL.md defines them as `query`, `context`, `impact`, `rename`, `detect_changes`. The same concept uses two different names across the skill and its asset. Reason: Divergent tool names can make an agent guess the actual MCP tool identifier, risking a wrong or invalid tool call. Solution: Make the tool names consistent: either prefix the tool definitions in SKILL.md with `gitnexus_` or drop the prefix in the examples, so one term maps to one concept.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/init-workspace-documentation/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Output Contract	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/init-workspace-rules/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/skills/init-workspace-verification/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Workflow Completeness	Problem: The NEW file deletes the 'DEPRECATED ARTIFACTS (notify user, do NOT auto-delete)' block that flagged the r1 state file `agents/init-rosetta-shells-flow-state.md` and the local `init-rosetta-shells-flow.md`. Verification no longer instructs the agent to notify the user about these stale r1 artifacts during an upgrade. Reason: Removing the notice means an N-1 upgrade can silently leave obsolete r1 files behind, so the workspace is left in an inconsistent state with no user notification. Solution: Restore a short deprecated-artifacts notice in the verification process (notify user, do NOT auto-delete) covering leftover r1 shell-flow state/files, or confirm equivalent cleanup guidance exists in another init-workspace skill so the upgrade path still surfaces stale artifacts.

📊 Gates Comparison

Gate	Score	Comparison
Workflow Completeness	4	⬇️ Slightly worse
Reference Integrity	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/load-context-instructions/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Failure Handling	Problem: Fallback mode reads bootstrap files from the repo and lists docs, but there is no guidance for when none of the bootstrap files exist or `get_context_instructions` fails partway. The blocking-gate language ('do not proceed until complete') has no paired branch for an unrecoverable load failure. Reason: Without an explicit failure path the agent may pass the gate with an empty/partial bootstrap and run without guardrails, which is the exact unreliable state this skill exists to prevent. Solution: Add a short failure branch: if no bootstrap files are found in fallback mode, or the MCP call fails after retry, stop and tell the user that Rosetta context could not be loaded rather than silently proceeding.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/load-context/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Precision & Explicitness	Problem: The new step 2 grep uses `^#{1,3}` both in prose and in the inline bash command `grep -n "^#{1,3}" ...`. In plain grep/ripgrep this matches a literal `#` followed by literal `{1,3}`, not 1-3 leading hashes; markdown headers will not be matched. The base file had no such command. Reason: A non-functional header grep returns nothing, so IMPLEMENTATION/MEMORY/PATTERNS/REQUIREMENTS headers are never surfaced and the agent loads incomplete project context. Solution: Use a pattern that actually matches 1-3 leading hashes, e.g. `grep -nE "^#{1,3} "` (extended regex) or `rg "^#{1,3} "`, and align the prose accordingly.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Precision & Explicitness	3	⬇️ Slightly worse
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/load-workflow/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Failure Handling	Problem: `<process>` step 1 is `ACQUIRE <workflow TAG from available workflows> FROM KB` but the skill has no branch for when no workflow tag matches the request, when ACQUIRE returns nothing, or when the request is too ambiguous to map to one workflow. Sibling loader skills handle their failure case: `load-context` has a `<troubleshooting>` block for missing files, `load-context-instructions` has per-mode handling. This router has none, yet it gates the whole session. Reason: Without a no-match branch the agent may silently pick a wrong workflow, sending the whole session down an incorrect execution path. Solution: Add a no-match branch in `<process>`: if no workflow tag matches or ACQUIRE returns empty, stop and ask the user to confirm intent or fall back to the ad-hoc/lightweight workflow; state which fallback applies.
🔵 Medium	Output Contract	Problem: The skill produces a side effect (plan phases injected) but defines no output marker or confirmation the orchestrator can check. Reason: A router with no observable output makes it hard for the parent agent to confirm the correct workflow was activated. Solution: Specify the expected post-state as the contract: name the workflow selected and confirm phases were upserted, so the orchestrator has a deterministic signal the router ran.
🔵 Medium	Success Criteria	Problem: There is no explicit testable done-when. `<next-steps>` describes what happens next, not the completion condition of this skill (workflow selected, phases injected, state restored when resuming). The sibling `load-context` base carried an explicit completion gate (its deletion was itself flagged). Reason: Without a testable completion condition the agent cannot reliably tell when routing is finished, risking premature handoff to execution. Solution: Add a one-line success condition: complete when the best-matching workflow is loaded, its phases are upserted into the plan via OPERATION_MANAGER, and resume-state is restored if the user asked to continue.
🔵 Medium	Self-Validation	Problem: `<next-steps>` says only `Execute all accumulated plan phases and steps`. There is no verification that the workflow was actually selected/loaded, that its phases were injected into the plan, or that resume-state (step 2) was restored before execution begins. Reason: If phase injection or state restore silently fails, the agent proceeds with an empty or stale plan and skips workflow steps — the primary reliability failure this skill exists to prevent. Solution: Add a verification step after step 4: confirm the chosen workflow's phases are present in OPERATION_MANAGER and (when resuming) that completed steps and current phase were restored, before declaring the skill done.
🔵 Medium	Decision Branching	Problem: Step 3 `Handle planning and auto mode correctly — distinguish auto vs` No HITL`` states a decision but gives no explicit if/then/else. Only step 2 (resume) carries a real branch. Sibling `load-context-instructions` scores well here because it spells out explicit mode branches. Reason: Left implicit, the agent may treat auto-approval mode as `No HITL` and skip required human gates, which is the exact failure the bootstrap warns against. Solution: Convert step 3 into explicit branches: if `No HITL` requested → proceed without approval gates; else (including auto/auto-approval) → keep HITL approval gates active per the `hitl` skill.
🔵 Medium	Input Contract	Problem: Step 1 consumes an `<available workflows>` list but the skill never states where that list comes from (bootstrap prep step, KB listing, or context). The sibling `load-context` names its exact input files; this skill leaves its primary input source implicit. Reason: If the workflow list is not reliably present the router cannot match, and the agent cannot tell whether an empty match means no workflow or a missing input. Solution: Name the source of the available-workflows list in `<prerequisites>` or step 1 (e.g. the workflow catalog listed during bootstrap prep steps), so the router has a defined input rather than an implicit one.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/mcp-confluence-data-collection/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/mcp-confluence-data-collection/references/cql-and-redaction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/mcp-confluence-data-collection/references/vendor-swap.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/mcp-jira-data-collection/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Dependency Management	Problem: The redaction catalog (grep patterns, placeholder vocabulary, the 5 redaction target categories) is fully baked into `<safety_boundaries>` inline in the SKILL.md, while the sibling confluence skill moved the identical catalog into `references/cql-and-redaction.md` and loads it on demand. The jira skill keeps it always-loaded inline, duplicating domain knowledge that could be retrieved. Reason: Always-loading the full pattern catalog inflates the jira skill's context cost on every invocation and creates two copies of the same redaction knowledge that can drift apart. Solution: Move the inline redaction pattern/placeholder catalog out of jira's `<safety_boundaries>` into a lazy-loaded reference (mirroring the confluence skill), keeping only the operational decision-time rules inline.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/mcp-jira-data-collection/references/vendor-swap.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/mcp-testrail-data-collection/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Bloat Control	Problem: SKILL.md is 10,603 chars (10K-20K band). The <safety_boundaries> block repeats the redaction policy at high length with regex patterns, and <failure_handling> plus <validation_checklist> restate the same failure cases and read-only contract already covered in , , and <safety_boundaries>. Reason: Per the rubric the 10K-20K size band warrants a high-severity flag; the redaction regex detail is maintainer-grade and not needed in every runtime extraction, so it inflates resent history tokens without changing runtime behavior. Solution: Move the detailed regex redaction pattern catalog into the existing references/vendor-swap.md sibling or a new references/redaction.md and leave a one-line pointer in <safety_boundaries>, mirroring the on-demand <vendor_replacement> split already used in this file. Collapse the duplicate read-only / case-not-found statements so each appears once.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/mcp-testrail-data-collection/references/vendor-swap.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/operation-manager/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Reference Integrity	Problem: The <core_concepts> fallback bullet has an unbalanced/misplaced backtick in the CLI template: `npx rosettify@latest <command> <subcommand> <plan_file`> — the closing backtick sits after plan_file's angle bracket, so the inline-code span and the <plan_file> placeholder render incorrectly. Reason: A garbled command template can be copied verbatim by the agent, producing a malformed CLI call; it is a precision defect on an operational reference, not a style nitpick. Solution: Fix the backtick placement to `npx rosettify@latest <command> <subcommand> <plan_file>` so the code span closes after the placeholder, matching the correctly-formatted invocations used elsewhere in .

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/operation-manager/assets/om-schema.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/orchestrator-contract/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/qa-data-collection/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/qa-data-collection/references/backend-source-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/qa-data-collection/references/existing-test-patterns.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/qa-data-collection/references/output-template.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/qa-gap-analysis/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Self-Validation	Problem: The validation_checklist item 'Question count <= 20 per batch (pitfall 2)' counts only Critical+Important questions, but the success_criteria 'Questions Asked = Critical+Important+Optional combined' and the Executive Summary 'Questions Asked' count include Optional too. The batching cap and the reported count use different denominators, so an artifact with many Optional questions could pass the <=20 grep while the Executive Summary reports a much higher 'Questions Asked'. Reason: Two adjacent rules use the same word 'questions' with different scopes, which can make the self-validation grep and the reported count disagree without an actual error. Solution: In qa-gap-analysis/SKILL.md make the batch-cap basis explicit and consistent: state that the <=20 cap applies to Critical+Important only (already implied) and that the Executive Summary 'Questions Asked' total is the combined Critical+Important+Optional, so the two numbers are expected to differ; or align both on the same basis.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/qa-project-config/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/qa-test-debugging/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Input Contract	Problem: Unlike the sibling qa-test-implementation skill (which has a dedicated <input_contract> table with default paths and required content), qa-test-debugging specifies inputs only via and step 1 prose. The expected report format (JUnit XML / JSON / plain log) is named only implicitly in <failure_handling> ('malformed JSON/XML/JUnit'), not declared as an accepted-input contract up front. Reason: Inputs are recoverable from prerequisites + failure handling, but the format contract is implicit, slightly weaker than the sibling skill's explicit table. Solution: Add an explicit accepted-report-formats line near or step 2 listing the parseable formats (JUnit XML, JSON, plain text log) so the agent validates format against a stated contract rather than inferring it from the failure branch.
🔵 Medium	Single Responsibility	Problem: The skill explicitly bundles two responsibilities with different risk profiles: Part A (read-only report analysis) and Part B (writes test source + runs lint). The <when_to_use_skill> section acknowledges this and defends it ('A caller may invoke Part A only'), so it is well-managed, but it remains two jobs in one prompt rather than the healthy 1-2 single-purpose ideal. Reason: Read-only analysis and write-path correction are coupled, but the boundary statement and progressive disclosure mitigate the coupling, so this is a minor note not a regression. Solution: Acceptable as-is given the explicit Part-A-only invocation boundary and the lazy-loaded part-b-mechanics.md keeping Part B material out of context for analysis-only calls. If future drift adds a third job, split Part B into a sibling skill. No change required now.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/qa-test-debugging/references/failure-catalog.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/qa-test-debugging/references/part-b-mechanics.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/qa-test-implementation/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Safety Boundaries	Problem: The skill writes test + utility source files and runs lint, yet has no dedicated <safety_boundaries> section. Write-scope protection is distributed: 'No hardcoded URLs / credentials / production data' and 'Synthetic test data only' live in the <validation_checklist> and , and the step 1 GATE handles approval. There is no single statement bounding which files the skill may write (test/helper only) versus app source, unlike the sibling qa-test-debugging which states a 'Test-code-only writes' rule. Reason: Approval gate and no-hardcoded-secrets checks exist, but the affirmative write-scope boundary (no app-source writes) is implicit, a subtle gap for a write-path skill. Solution: Add a short <safety_boundaries> section (or a write-scope line in step 1/step 4) stating the skill writes only test files, shared helper/utility files, and never application/product source, mirroring qa-test-debugging's Test-code-only-writes rule.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/qa-test-implementation/references/multi-language-examples.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/repository-implementation-standards/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/requirements-synthesis/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Input Contract	Problem: The skill consumes multiple input artifacts (raw-data files, analysis output, user answers / answers.md) but never names their expected paths or formats. lists them only as informal bullets ('Collected raw data from at least one source'), and step 1 says 'Load all source data' without a path contract. The sibling g10 skill sequential-workflow-execution has an explicit input-contract table with Source/Required columns; this skill lacks an equivalent, so an agent must guess where inputs live. Reason: Without an explicit input contract the agent may read the wrong files or mislocate answers.md, weakening the otherwise strong source-priority and failure-handling logic that depend on those inputs. Solution: Add a short input table (or extend ) naming the expected inputs explicitly: raw-data file location, analysis-output location, and the answers.md path the <failure_handling> 'No user answers' branch already references — with required/optional flags and the supplying source (parent workflow phase).

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	3	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/requirements-synthesis/references/output-schemas.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/sequential-workflow-execution/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/swagger-contracts-analysis/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/swagger-contracts-analysis/references/canonical-example.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Output Contract	5	✅ Much better
Instruction Ordering	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/swagger-contracts-analysis/references/failure-handling-edge-cases.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/swagger-contracts-analysis/references/redaction-catalog.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/testrail-test-case-authoring/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Bloat Control	Problem: The redaction/safety discipline is restated three times: inline in <safety_boundaries>, again in (literal-credential line), and again as a re-scan item in <validation_checklist>, plus the catalog in the reference file. The SKILL.md body is ~12.9K chars (10K-20K range), partly driven by this repetition of the same MUST-not-leak-credentials rule. Reason: Same instruction repeated in four places is re-sent every call with no added behavioral value and pushes the file into the 10K-20K size band the rubric flags. Solution: Keep the operational redaction rule canonical in <safety_boundaries> only; have and <validation_checklist> reference it by a short pointer (e.g. 'safety re-scan per <safety_boundaries>') rather than re-listing the same grep targets and placeholder logic.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/testrail-test-case-authoring/references/examples-and-redaction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/testrail-test-case-export/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Reference Integrity	Problem: <safety_boundaries> deep-links into a SIBLING skill's private reference: `../testrail-test-case-authoring/references/examples-and-redaction.md#targets-to-placeholder-never-literal`. Cross-skill deep linking into another skill's internal references violates the skill-folder isolation boundary (skills must not deep-link private content of another skill). Reason: A consumer reaching into another skill's private references couples the two skills' internal layouts; if the authoring skill renames its reference or anchor, the export skill's safety catalog link silently breaks at the exact moment redaction matters. Solution: Replace the cross-skill deep link with an inline copy of the small placeholder vocabulary this skill actually needs (the 4-5 placeholder tokens already partly listed inline), or move the shared placeholder vocabulary to a neutral shared location both skills ACQUIRE; do not reach into the authoring skill's private references folder.
🔵 Medium	Bloat Control	Problem: The confirmation-gate and dedup-pre-scan rules are stated three times: full detail in step 7, restated in <safety_boundaries>, and again line-by-line in <validation_checklist>. Same with the redaction targets list. SKILL.md body is ~14K chars (10K-20K band). Reason: Triplicated procedure text is re-sent every call without behavioral value and is the main driver pushing the file into the 10K-20K size band the rubric flags. Solution: Keep step 7 as the canonical gate description; have <safety_boundaries> and <validation_checklist> point to 'step 7 (canonical)' for the procedure rather than re-listing the dedup/scan/confirm sequence and the redaction target list a second and third time.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/testrail-test-case-export/references/vendor-porting.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/user-approved-code-changes/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/workflows/adhoc-flow.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Conflict Resolution	Problem: The rename of the <plan_manager> block to <OPERATION_MANAGER> DELETED the orchestrator/subagent coordination directives that the base carried: 'todo tasks/built-in planners are for tracking INSIDE step execution only', 'Orchestrator MUST tell subagents all above MUST as MUST (within their scope)', and 'MUST tell subagents: "tell orchestrator to modify plan if work is outside your scope"'. The new generic command-catalog block (copied from the bootstrap) does not carry the subagent-scope-escalation rule. Reason: adhoc-flow delegates to subagents (see <building_blocks> subagent-delegation and phase 4); losing the explicit 'subagent reports out-of-scope work to orchestrator' rule lets subagents silently mutate the plan or drift, which the deleted lines were specifically there to prevent. Solution: Re-add the deleted orchestrator->subagent coordination lines (subagent escalates out-of-scope work back to orchestrator; built-in/todo planners are intra-step tracking only) to the new <OPERATION_MANAGER> or block; the generic command list does not replace this scope-coordination contract.
🔵 Medium	Reference Integrity	Problem: The base block ended with 'ACQUIRE `plan-manager/assets/pm-schema.md` FROM KB for data structure reference.' The rename dropped any equivalent schema-acquire pointer; the new block references command shapes inline but gives no path to the plan/data-structure schema for upsert authoring. Reason: upsert with RFC-7396 merge needs the data-structure schema; the base gave an explicit acquire path and the new version removed it, so a plan author building upserts has no in-workflow pointer to the schema. Solution: Add an 'ACQUIRE operation-manager/assets/.md FROM KB' pointer (matching the renamed skill's actual asset) so plan authors retain the structured-schema reference the base provided for upsert payloads.
🔵 Medium	Bloat Control	Problem: The base <plan_manager> block was a compact ~13-line workflow-scoped contract. It was replaced with the full ~20-line OPERATION_MANAGER command catalog (help plan, next, create-with-template, upsert-with-template, update_status, query, show_status, RFC 7396 note, loop note) which is verbatim identical to the OPERATION_MANAGER block already always present in the bootstrap/CLAUDE.md that is resent every turn. Reason: Re-stating the entire bootstrap command catalog inside a workflow that loads only after the bootstrap duplicates always-in-context content, adding tokens every call with no new information. Solution: Since the full command catalog is already guaranteed in the always-loaded bootstrap, the workflow only needs the workflow-specific deltas (which building blocks call operation-manager, the loop/upsert obligations, and the subagent-coordination rules). Replace the duplicated catalog with a short pointer plus those deltas.

📊 Gates Comparison

Gate	Score	Comparison
Conflict Resolution	3	⬇️ Slightly worse
Precision & Explicitness	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Bloat Control	3	⬇️ Slightly worse
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow-code-analysis.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Workflow Completeness	Problem: BASE listed six explicit numbered tasks (read project description, read user-instructions, frontend analysis, page-object inventory, similar-test search, reusable-utility identification, plan update) inline. NEW collapses all detail into a single `<execute_analysis>` step (`USE SKILL aqa-codebase-analysis`) plus a 4-item `<validate_findings>` checklist. The ordered sub-task detail is no longer in the phase file. Reason: Content was relocated to a well-formed bound skill via progressive disclosure, not lost; instructions are not user-facing so this compression is acceptable and reduces context cost. Minor severity because phase no longer self-documents the step sequence, relying on skill being loaded. Solution: No fix required for correctness — verified the relocated detail (project description, user-instructions Must/Should/Nice categorization, frontend analysis, page-object inventory, similar-test search, reusable utilities, 9-section report template) is fully present in the `aqa-codebase-analysis` SKILL and its report-template reference. If any phase-level traceability is desired, keep the `<validate_findings>` checklist as the anchor (already present).

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better

📄 `instructions/r2/core/workflows/aqa-flow-data-collection.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow-requirements-clarification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow-selector-identification.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Workflow Completeness	Problem: BASE had 7 explicit tasks including Task 1 (map every test step to required UI interactions), Task 2 (check existing page objects with an Available/Missing/Uncertain categorization), and the selector-strategy preference order (data-testid > id > class > XPath). NEW reduces selector work to one line: `USE SKILL aqa-selector-management` `Execute Part A only`. The interaction-mapping and existing-selector-check steps are no longer enumerated in the phase. Reason: The page-source capture protocol (the genuinely user-facing part) was correctly kept verbatim with a documented rationale; the deleted detail is mechanical selector analysis suited to the skill. Low severity assuming the skill covers it; flagged because the phase no longer makes the existing-selector-check an explicit gate. Solution: Confirm `aqa-selector-management` Part A owns interaction-mapping, the existing-page-object availability check, and the selector-strategy preference order; if any of those are not covered by the skill, restore a one-line pointer in `<execute_identification>` naming them as Part A deliverables. Do not re-inline the full BASE tables.
⚪ Low	Output Contract	Problem: BASE prescribed a full `## Phase 4: Selector Identification` test-plan section (interaction map, existing-vs-missing, identified selectors, selector strategy, notes). NEW only updates `agents/aqa-state.md` fields and a 5-item validation checklist; the structured selector documentation written into the test plan is no longer specified here. Reason: Likely relocated to the skill output contract; minor because the state-file echo still captures counts and strategy. Cosmetic-level traceability concern only. Solution: Verify the identified-selector documentation output is owned by `aqa-selector-management` Part A's output template; if so this is fine. Otherwise add one line stating where the selector map is recorded (test plan vs report).

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better

📄 `instructions/r2/core/workflows/aqa-flow-selector-implementation.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Workflow Completeness	Problem: BASE Task 4 (`Add Documentation (If Project Uses It)` — JSDoc/TSDoc for new selectors and methods) is not represented anywhere in NEW. NEW's steps are: ACQUIRE+USE the two skills, run Part B, lint, update state. The conditional documentation-of-selectors step was dropped and is not echoed in the validation checklist. Reason: BASE explicitly made selector documentation conditional on project convention; if no skill owns it the convention-matching behavior is silently lost. Severity 2 because it is conditional and low-blast-radius, but it is a genuine deleted behavioral step, so comparison on Workflow Completeness is below neutral. Solution: Confirm `aqa-selector-management` Part B (or `repository-implementation-standards`) covers conditional selector/method documentation matching existing project doc style. If neither does, add a one-line item to `<execute_implementation>` or the `<validation_checklist>`: 'document new selectors/methods only if the project already uses JSDoc/TSDoc, matching existing style.'

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better

📄 `instructions/r2/core/workflows/aqa-flow-test-correction.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Workflow Completeness	Problem: BASE enumerated fix categories explicitly (Selector / Timing / Assertion / Setup / Test-code issues) in Task 2 and a fix-prioritization scheme (Critical/High/Medium-Low). NEW delegates all correction preparation to the `user-approved-code-changes` skill (with a `debugging`->`coding`->`aqa-test-debugging Part B` fallback) and does not enumerate the categories or prioritization in the phase. Reason: The critical HITL approval gate is preserved and hardened (explicit approval tokens, preparation-only guardrail forbidding writes before step 8.3, disambiguation rule, fallback chain) — net safety improvement. The dropped item is the fix taxonomy/prioritization, mechanical detail suited to the skill. Severity 2: only a regression if no bound skill carries the taxonomy. Solution: Verify `user-approved-code-changes` and/or `aqa-test-debugging` Part B own the fix-type taxonomy and prioritization. If not, add a single pointer line in `<execute_corrections>` referencing where the Selector/Timing/Assertion/Setup taxonomy and Critical/High/Low prioritization live. No need to re-inline the BASE tables.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow-test-implementation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Output Contract	Problem: The NEW state-file template was reduced to a bare phase checklist. BASE included per-phase 'Test Details' subsections capturing concrete fields (TestRail Case, Confluence Pages, Existing Page Objects, Test File path, Tests Failed count, Root Causes, etc.) plus completion dates per row. Those structured capture fields are gone from the NEW template. Reason: The state file is the cross-phase memory; dropping the structured fields makes later spot checks rely on free-form text and weakens deterministic resume after compaction. Solution: Re-add a minimal set of per-phase capture fields to the state-file template (at least Phase 3 page-object list, Phase 6 test file path, Phase 7 root-causes list) or point the template at the per-phase docs that own those fields, so downstream phases and the success-criteria spot checks have a defined place to read prior outputs.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow-test-report-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/coding-agents-prompting-flow.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Reference Integrity	Problem: BASE prerequisite 3 ('Orchestrator and subagents MUST USE SKILL `coding-agents-prompt-authoring`') was deleted from the rewritten prerequisites block. Behavioral tracing shows phases 2-7 still delegate to the `prompt-engineer` subagent, which independently binds that skill, so the skill is not unbound system-wide; but the orchestrator no longer has an explicit mandate to load it for the coordination/blueprint work it performs directly. Reason: The subagent binding mitigates the deletion, but removing the explicit orchestrator-level mandate weakens the guarantee that the authoring skill governs orchestrator-side decisions. Solution: Restore an explicit one-line binding in the workflow prerequisites that the orchestrator MUST USE SKILL `coding-agents-prompt-authoring`, or confirm in the workflow that all skill-dependent work is delegated to the prompt-engineer subagent so the orchestrator-level binding is intentionally unnecessary.

📊 Gates Comparison

Gate	Score	Comparison
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Reference Integrity	3	⬇️ Slightly worse
Structural Coherence	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/coding-flow.md`

⚠️ Issues Found

Severity	Gate	Details
⚪ Low	Structural Coherence	Problem: The newly added `user_review_design phase=3` block lists all four items with the marker `1.` (four `1.` lines instead of 1-4). Every other phase in the file uses sequential 1-N numbering. Reason: Repeated `1.` markers in an ordered HITL gate are inconsistent with the rest of the file and can blur step ordering, though the steps still read sequentially so impact is cosmetic. Solution: Renumber the four items in `user_review_design` to 1, 2, 3, 4.

📊 Gates Comparison

Gate	Score	Comparison
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/external-lib-flow.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Workflow Completeness	Problem: The added `Phase 0: Prerequsites` block numbers its two items as `1.` then `3.` (skips 2). In a workflow that explicitly stresses 'Do not skip steps!' and 'Make sure to have todo tasks for each step', a gap in the numbered list can cause an agent to expect/look for a missing step 2. Reason: Sequence integrity matters in a step-driven onboarding flow; a 1-then-3 list is a small but concrete ordering defect introduced by this change. Solution: Renumber the Phase 0 items to 1 and 2.
⚪ Low	Structural Coherence	Problem: The added Phase 0 header is misspelled 'Prerequsites'. Sibling workflows label this section 'Prerequisites'/'prerequisites'. Reason: Cosmetic typo in a section header; does not change behavior but reduces consistency with other workflow files. Solution: Fix the spelling to 'Prerequisites'.

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/workflows/init-workspace-flow.md`

⚠️ Issues Found

Severity	Gate	Details
⚪ Low	Workflow Completeness	Problem: The verification phase (now Phase 9) deleted base step 4 'Notify user: delete `init-rosetta-shells-flow.md`.' The cleanup notification to remove the obsolete bootstrap shells flow file no longer fires at the end of the workflow. Reason: Removing the cleanup notice is safe only if the stale file is never produced; otherwise leftover bootstrap files could re-trigger an old flow. Solution: Confirm the `init-rosetta-shells-flow.md` artifact is no longer generated by Phase 2; if it can still appear in upgrade-from-R2 workspaces, restore a step instructing the user to delete it. Grep of instructions/r2 shows no remaining references, so the deletion is consistent cleanup; only restore if upgrade paths can leave the stale file.

📊 Gates Comparison

Gate	Score	Comparison
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/init-workspace-flow-context.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/workflows/init-workspace-flow-discovery.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/workflows/init-workspace-flow-questions.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/workflows/init-workspace-flow-rules.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/workflows/init-workspace-flow-shells.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/workflows/modernization-flow.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/qa-flow-api-spec-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/qa-flow-data-collection.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/qa-flow-documentation-mcp-subflow.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Cognitive Budget	Problem: The `<execute_documentation_mcp>` preamble paragraph and the early-exit rule add a layer of meta-narration ('Branch triggers reference <output_contract> by name; the literal outcome line is inlined parenthetically at each trigger site...') that explains the file's own cross-referencing design rather than giving directives. An agent must parse the indirection between the inlined parenthetical outcome lines, the `<output_contract>` table, and the `<verify_remediation>` block to execute a single branch. Reason: The narration is design commentary, not an instruction; removing it reduces parsing load without losing any branch behavior. Solution: Drop the meta-explanation paragraph (lines describing why outcome lines are inlined) and keep only the operative early-exit rule. The branch directives already inline the literal outcome line, so the narration restating that design is non-functional.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/qa-flow-execution-and-report-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/qa-flow-gap-and-requirements-clarification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/qa-flow-project-config-loading.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Reference Integrity	Problem: The `<validation_checklist>` requires `agents/qa-state.md` to be 'created with Phase 0 marked complete and `IDENTIFIER:` field matching the `agents/qa/{IDENTIFIER}/` directory name' and requires the `{IDENTIFIER}` value to be identical across the qa-state.md IDENTIFIER field. But the parent `qa-flow.md` state-file template (its `<state_file>` block) has no `IDENTIFIER:` row — it lists Last Updated / Current Phase / Test Case Source / Feature / API Base URL only. The checklist binds to a state field the canonical template does not define. Reason: The validation step checks a field that the producing template never writes, so the check can never pass against a state file built only from the parent template. Solution: Either add an `IDENTIFIER:` line to the `qa-flow.md` `<state_file>` template, or change this phase's `<update_state>` step 0.2 to explicitly write the `IDENTIFIER:` line into qa-state.md so the validation target exists. Reference the qa-flow.md template field name exactly.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/qa-flow-test-case-specification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/qa-flow.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Bloat Control	Problem: The `<skip_rules>` block (lines 24-47) is a single dense decision-tree for ONE override (skip Phases 0-2). It restates the same scope-lock idea many times: 'only sanctioned no-ask deviation', 'Scope: applies ONLY at this skip-verification gate', 'Authority on ask-before-action elsewhere', plus a separate Rationale line, plus an explicit carve-outs list that duplicates the per-phase HITL gate `type="HITL"` markers already present on each phase block. The redundancy makes the single most safety-relevant branch harder to parse. Reason: A safety override that is restated four ways increases the chance the agent skips or misreads a step; compressing it improves reliability of the one branch most likely to bypass HITL. Solution: Collapse the repeated scope-lock statements (the 'Deference', 'Scope', 'Authority', and 'Rationale' clauses) into one short precondition + one action + one fallback. Drop the carve-out re-listing of HITL gates since each phase block already carries `type="HITL"`; reference them by pointer instead of re-enumerating.
🔵 Medium	Cognitive Budget	Problem: The router mixes high-density prose (the skip-rules override at lines 27-37) with the otherwise clean phase table. The override paragraph packs ~7 nested conditions (a/b/c preconditions, uncertain-partial branch, unambiguous-instruction branch, carve-outs) into prose rather than decomposed if/then lines, exceeding the ~5-step reliable handling guidance for a single block. Reason: Prose with many embedded conditions is processed less reliably than enumerated branches; decomposition reduces skipped-condition risk at the only no-ask gate. Solution: Decompose the override into a short numbered if/then/else list (precondition check -> hold -> skip; fail+unambiguous -> announce+start Phase 0; uncertain -> ASK). Keep one line per branch so the agent can execute it as discrete steps.
⚪ Low	Output Contract	Problem: The router itself defines the `agents/qa-state.md` template (lines 159-178) and per-phase output paths, but does not give a canonical example of the skip-gate refusal one-liner beyond an inline parenthetical at line 34; the announced format ('skip-gate refused: ...') is described in words only. Reason: The format is already specified inline and understandable; a fenced example would marginally improve determinism but is not a behavioral gap. Solution: No change required for correctness; optionally add the refusal line as a fenced example next to the existing state-file fence so the emitted format is deterministic. Low priority.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/qa-flow-test-implementation.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Reference Integrity	Problem: The phase asserts (lines 25-34) the exact internal contract of the `automation-test-implementation-handoff` skill — that it declares a `<recommended_foundational_skills>` block and a 'step-4 GATE' that emits `foundational skill <name> not loaded by calling workflow`. This couples the phase to the named internal anchors of a sibling skill. If the handoff skill's internal section names drift, the acceptance-criteria check at step 5.1 sub-step 4 (lines 31-34) becomes a false-negative and blocks the phase even when the skill is correct. Reason: Asserting a sibling skill's private section names violates skill isolation and creates a brittle gate that can deadlock the phase on a cosmetic rename of the handoff skill. Solution: Keep the behavioral contract (handoff must verify-presence and must not ACQUIRE foundational skills) but soften the dependency on the skill's exact internal section name; check the observable behavior (does it verify presence / does it claim to ACQUIRE) rather than the literal `<recommended_foundational_skills>` tag name.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/qa-flow-test-correction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/research-flow.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Failure Handling	Problem: Phase 1 directs reading CONTEXT.md, ARCHITECTURE.md, and IMPLEMENTATION.md but gives no handling if those files are missing or empty, and no fallback when the research subagent returns no grounded references. This was unchanged from base, but the diff restructured the adjacent prerequisites without adding any missing-input handling. Reason: Without a missing-file fallback the researcher subagent may stall or fail silently on workspaces lacking those standard files. Solution: Add a prerequisite or phase-1 note: if any of CONTEXT/ARCHITECTURE/IMPLEMENTATION is absent, proceed with available context and record the gap in research-flow-state.md rather than aborting.

📊 Gates Comparison

Gate	Score	Comparison
Success Criteria	4	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/self-help-flow.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Structural Coherence	Problem: The diff wrapped phase-0 prerequisites in a `<prerequisites phase="0", applies="ALL">` block, but the opening tag is duplicated (lines 20 and 27) with NO closing `</prerequisites>`, so the block leaks into phase 1. The opener also uses malformed XML attribute syntax (comma-separated `phase="0", applies="ALL"`). Sibling research-flow uses the correct single, closed block. Reason: The unclosed/duplicated tag breaks the XML section boundaries the agent relies on to delimit phases, risking prerequisites and phase-1 content being read as one block. Solution: Replace the duplicate line-27 opener with a closing `</prerequisites>` tag and remove the comma between attributes, matching research-flow's correct pattern.
🔵 Medium	Reference Integrity	Problem: Same change as above: the two identical <prerequisites phase="0", applies="ALL"> opening tags with no matching close make the section structure self-inconsistent. Sibling workflow research-flow.md uses the same wrapper correctly (open + close), confirming the intended pattern was a properly closed block. Reason: Consistent, resolvable section tags keep the workflow parseable and aligned with the sibling flow's convention. Solution: Mirror research-flow.md: a single <prerequisites phase="0", applies="ALL"> opener and a single closer around the four numbered prerequisite items.

📊 Gates Comparison

Gate	Score	Comparison
Instruction Ordering	4	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	3	⬇️ Slightly worse
Structural Coherence	2	⬇️ Slightly worse

📄 `instructions/r2/core/workflows/testgen-flow-data-collection.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	5	✅ Much better
Failure Handling	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/workflows/testgen-flow-gap-and-contradiction-analysis.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Output Contract	Problem: The detailed per-entry document formats for contradictions (C1), gaps (G1), and ambiguities (A1) — including their exact field structure (Type, Source quotes, Impact, Needs Clarification) — were deleted from this phase. The NEW file delegates sections 1-6 entirely to the `gap-and-contradiction-analysis` skill and only keeps the appended sections 7 + Metadata. Reason: Phase 2's primary output (categorized findings with source quotes) must have its shape defined somewhere; the phase moved ownership to the skill rather than dropping it, so the regression risk is contained to skill-coverage verification, not a hard contract loss. Solution: This is acceptable since the skill now owns the per-entry schema, but confirm `gap-and-contradiction-analysis/SKILL.md` (or its references) actually defines C/G/A entry shapes with source quotes. If the skill defines them, no action needed; if not, the contract for the core analysis output is now undefined in either place.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/testgen-flow-project-config-loading.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/testgen-flow-question-generation.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Output Contract	Problem: The user-facing `questions.md` template lost its in-document `How to Answer` numbered procedure (open file, fill answer, save, notify) detail and dropped the `## Additional Questions or Comments` free-text section and the `## Completion Checklist`. The NEW `How to Answer` paragraph is terser. Since `questions.md` is filled out by a human, this is a user-facing artifact where clarity matters per rubric. Reason: The dropped sections were partly redundant with the agent-side validation, but the free-text `Additional Questions or Comments` slot was the only channel for user input outside the generated questions, so its loss slightly narrows the HITL capture surface. Solution: The NEW `How to Answer` paragraph still tells the user to replace `[Leave blank for user]` and how to mark UNKNOWN, so the core instruction survives. Optionally restore a one-line `## Additional Comments` slot so users can add context not covered by generated questions; the completion checklist removal is low-impact since validate_answers (step 3.3) re-checks completeness agent-side.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/testgen-flow-requirements-document-generation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/testgen-flow-test-case-export.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	✅ Much better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/workflows/testgen-flow-test-case-generation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/workflows/testgen-flow.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Example Grounding	Problem: The BASE router contained concrete 'Common Patterns' examples (initial-prompt formats, Jira/Confluence input formats, Confluence search CQL `type=page AND space=PROJ AND text ~ 'feature'`). The NEW router drops all of these; the only remaining inline example is in <phase_5_6_standards_gate> (cypress path). Reason: Input-format and CQL examples are execution detail for the data-collection phase, not the router. Moving them down keeps the router thin; grounding for the agent is preserved at the phase level. Low impact, hence severity 2. Solution: Verify the Confluence search/CQL examples now live in testgen-flow-data-collection.md (the phase that actually performs collection) so the guidance is available where used. The router is a router; examples belong in phase files. Keep at least the one standards-gate example present.
🔵 Medium	Failure Handling	Problem: The BASE router had a dedicated 'Error Handling' section (Jira ticket not found, no Confluence results, user doesn't answer questions, incomplete requirements) plus per-phase 'Validation Rules' (e.g. raw-data.md must contain both sections; >=80% exported). The NEW router removes both sections; the router itself now states no failure-path behavior inline. Reason: Removed-from-router error/validation content was relocated to phase files, not deleted; router stays thin. Minor because failure handling for the orchestrator-level concerns is still covered via the verification-failure override and the per-phase files. Solution: Confirmed these cases migrated into child phase files (testgen-flow-data-collection.md handles 'Jira ticket not found'; testgen-flow-test-case-export.md handles the 80% threshold with PARTIAL/HALT logic). No content was truly lost, so this is a relocation appropriate for a thin router. No action required beyond ensuring the router's <validation_checklist> keeps pointing at the same artifacts.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	✅ Much better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/api-test-spec-authoring/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/api-test-spec-authoring/references/templates-and-redaction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/aqa-codebase-analysis/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/aqa-codebase-analysis/references/report-template.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/aqa-requirements-elicitation/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/aqa-selector-management/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/aqa-selector-management/references/strategy-and-template.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/aqa-test-authoring/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/aqa-test-authoring/references/test-implementation-template.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/aqa-test-debugging/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/aqa-test-debugging/references/escalation-template.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/aqa-test-debugging/references/part-b-mechanics.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/automation-test-execution-analysis/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/automation-test-execution-analysis/references/redaction-policy.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/automation-test-implementation-handoff/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Cognitive Budget	Problem: The SKILL.md is ~13.3K chars (10K-20K band). As a SKILL.md shell it lands in agent context automatically, so its full weight is paid on every call that loads this skill. The verify-don't-load contract, input contract table, 10-step process, output templates, and failure-handling are all detailed inline. Reason: SKILL.md shells are always in context; trimming the always-loaded portion lowers per-call token cost and compaction risk while leaving decision-time rules where the agent needs them. Solution: Move the two verbatim output templates (the user-facing handoff message block and the state-update template) and the per-stack command examples into a references/ file loaded on demand at step 7/step 10, keeping the GATEs and contract tables inline. This trims the auto-context shell without losing decision-time content.

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/confluence-source-harvesting/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Cognitive Budget	Problem: The SKILL.md is ~14.1K chars (10K-20K band) and is the largest of the three g23 SKILL shells. It restates the success-criteria done-condition, NOT-complete list, and validation-checklist with substantial overlap (the five NOT-complete items mirror the five validation-checklist items), all loaded into context automatically as a SKILL shell. Reason: The NOT-complete list and validation checklist duplicate the same five failure conditions; one canonical list cuts shell tokens paid on every load without losing any gate coverage. Solution: Collapse the redundancy between <success_criteria> 'NOT complete' bullets and <validation_checklist> — both enumerate the same five regressions (silent zero-page, children skipped, permission hidden, missing input, redaction skipped). Keep one canonical list and have the other reference it, reducing the always-loaded shell size.

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/confluence-source-harvesting/references/redaction-and-normalization.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/gap-and-contradiction-analysis/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/gap-and-contradiction-analysis/references/entry-templates-and-document-skeleton.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/mcp-confluence-data-collection/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/mcp-confluence-data-collection/references/cql-and-redaction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/mcp-confluence-data-collection/references/vendor-swap.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/mcp-jira-data-collection/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Dependency Management	Problem: The redaction catalog (grep patterns, placeholder vocabulary, the 5 redaction target categories) is fully baked into `<safety_boundaries>` inline in the SKILL.md, while the sibling confluence skill moved the identical catalog into `references/cql-and-redaction.md` and loads it on demand. The jira skill keeps it always-loaded inline, duplicating domain knowledge that could be retrieved. Reason: Always-loading the full pattern catalog inflates the jira skill's context cost on every invocation and creates two copies of the same redaction knowledge that can drift apart. Solution: Move the inline redaction pattern/placeholder catalog out of jira's `<safety_boundaries>` into a lazy-loaded reference (mirroring the confluence skill), keeping only the operational decision-time rules inline.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/mcp-jira-data-collection/references/vendor-swap.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/mcp-testrail-data-collection/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Bloat Control	Problem: SKILL.md is 10,603 chars (10K-20K band). The <safety_boundaries> block repeats the redaction policy at high length with regex patterns, and <failure_handling> plus <validation_checklist> restate the same failure cases and read-only contract already covered in , , and <safety_boundaries>. Reason: Per the rubric the 10K-20K size band warrants a high-severity flag; the redaction regex detail is maintainer-grade and not needed in every runtime extraction, so it inflates resent history tokens without changing runtime behavior. Solution: Move the detailed regex redaction pattern catalog into the existing references/vendor-swap.md sibling or a new references/redaction.md and leave a one-line pointer in <safety_boundaries>, mirroring the on-demand <vendor_replacement> split already used in this file. Collapse the duplicate read-only / case-not-found statements so each appears once.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/mcp-testrail-data-collection/references/vendor-swap.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/qa-data-collection/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/qa-data-collection/references/backend-source-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/qa-data-collection/references/existing-test-patterns.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/qa-data-collection/references/output-template.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/qa-gap-analysis/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Self-Validation	Problem: The validation_checklist item 'Question count <= 20 per batch (pitfall 2)' counts only Critical+Important questions, but the success_criteria 'Questions Asked = Critical+Important+Optional combined' and the Executive Summary 'Questions Asked' count include Optional too. The batching cap and the reported count use different denominators, so an artifact with many Optional questions could pass the <=20 grep while the Executive Summary reports a much higher 'Questions Asked'. Reason: Two adjacent rules use the same word 'questions' with different scopes, which can make the self-validation grep and the reported count disagree without an actual error. Solution: In qa-gap-analysis/SKILL.md make the batch-cap basis explicit and consistent: state that the <=20 cap applies to Critical+Important only (already implied) and that the Executive Summary 'Questions Asked' total is the combined Critical+Important+Optional, so the two numbers are expected to differ; or align both on the same basis.

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/qa-project-config/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/qa-test-debugging/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/qa-test-debugging/references/failure-catalog.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/skills/qa-test-debugging/references/part-b-mechanics.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/skills/qa-test-implementation/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/qa-test-implementation/references/multi-language-examples.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/repository-implementation-standards/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/requirements-synthesis/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/requirements-synthesis/references/output-schemas.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/sequential-workflow-execution/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/swagger-contracts-analysis/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/swagger-contracts-analysis/references/canonical-example.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/swagger-contracts-analysis/references/failure-handling-edge-cases.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/swagger-contracts-analysis/references/redaction-catalog.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/testrail-test-case-authoring/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Cognitive Budget	Problem: The SKILL.md is 12907 chars (10K-20K band). The success_criteria, safety_boundaries, failure_handling, and validation_checklist sections each restate the same MUST/MUST-NOT rules (BDD ban, gap-marker discipline, redaction discipline) in slightly different wording, increasing the always-loaded budget. Reason: Smaller always-loaded shell reduces per-turn token cost and the chance of contradictory drift between the four restating sections; content is r2-identical so this is a pre-existing trait carried into r3, scored low severity. Solution: Keep the operational rules canonical in one section (e.g. format_rules + safety_boundaries) and have success_criteria / validation_checklist reference them by name rather than re-stating each rule's full text. No behavioral content needs to change.

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/testrail-test-case-authoring/references/examples-and-redaction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/testrail-test-case-export/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Cognitive Budget	Problem: The SKILL.md is 14041 chars (10K-20K band). Step 7's confirmation gate, the safety_boundaries section, the validation_checklist, and the pitfalls block each re-describe the same destructive-write rules (no-write-without-confirmation, dedup pre-scan, ambiguity-defaults-to-cancel, redaction). The dedup/confirmation rule is stated at least four times. Reason: This is the largest always-loaded file in the group; trimming the restated mechanics lowers per-turn cost without weakening the gate, since step 7 already holds the authoritative version. Content is r2-identical, so low severity. Solution: Mark step 7 as the single canonical home for the confirmation-gate and dedup rules (it already labels itself 'canonical'), and reduce safety_boundaries / validation_checklist / pitfalls to one-line pointers to step 7 instead of re-stating the full mechanics.

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/testrail-test-case-export/references/vendor-porting.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/user-approved-code-changes/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/adhoc-flow.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Reference Integrity	5	✅ Much better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-code-analysis.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Decision Branching	Problem: BASE had explicit inline if/then handling for two conditional tasks: Task 1.5 stated 'If agents/user-instructions/ directory does not exist or is empty, skip this task ... Document that no user instructions files were found', and Task 2 had 'If frontend code NOT available, skip to Task 3'. NEW removes both explicit branches and only leaves parenthetical conditions ('(if directory exists)', 'user instructions extracted (if available)') in the validation checklist, deferring the actual skip/empty handling to the `aqa-codebase-analysis` skill without restating the else-path in the workflow. Reason: The deleted else-paths told the agent what to do when an input is missing; relying only on a parenthetical 'if available' can let an agent stall or omit the 'document none found' step. Solution: In `<execute_analysis>` or `<validate_findings>`, add one line stating the explicit else for each conditional (e.g. 'if `agents/user-instructions/` is absent/empty, record none-found and continue; if frontend source is absent, skip frontend analysis and continue') so the branch is anchored in the phase even though the skill performs the work.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	3	⬇️ Slightly worse
Instruction Ordering	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better

📄 `instructions/r3/core/workflows/aqa-flow-data-collection.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-requirements-clarification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-selector-identification.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Output Contract	Problem: BASE provided a detailed `## Phase 4: Selector Identification` test-plan output template (Required Selectors Analysis, Existing vs Missing, Frontend Code Analysis, Identified Selectors, Selector Strategy, Notes) plus a worked 'Identified Selectors' documentation example with HTML/selector/type/usage rows. NEW deletes the test-plan output template and the worked selector-documentation block entirely; the phase now only names 'complete selector map with values and strategy' in `<workflow_context>` and defers the format to `aqa-selector-management` Part A without an in-phase schema or example. Reason: The deleted template/example was the only concrete output shape; without confirming the skill owns it, Phase 5 (which consumes the 'selector map from Phase 4') has no guaranteed field set to read. Solution: Confirm `aqa-selector-management` Part A owns and emits the selector-map schema (HTML source, chosen selector, type, usage, strategy); if it does, add a one-line pointer in `<execute_identification>` ('selector-map schema owned by aqa-selector-management Part A'). If it does not own a concrete schema, restore a minimal selector-map field list in the phase.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better

📄 `instructions/r3/core/workflows/aqa-flow-selector-implementation.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Output Contract	Problem: BASE carried concrete code-pattern examples (TypeScript page-object selector additions, helper methods, new-page-object template with imports/constructor) and a `## Phase 5: Selector Implementation` test-plan output template. NEW removes all code examples and the test-plan template, keeping only a state-file echo of fields ('Page Objects Modified/Created, Total Selectors Added, Helper Methods Added, Linting'); the implementation pattern and conventions are deferred to `aqa-selector-management` Part B and `repository-implementation-standards`. Reason: The deleted examples/template were the in-phase output anchor; the new file's correctness now fully depends on the referenced skills owning the conventions, so a pointer keeps the contract traceable. Solution: This deferral is reasonable since the skills own 'follow project conventions exactly'; to fully close the gap, confirm `aqa-selector-management` Part B emits the page-object/test-plan output template, and add a one-line pointer to it in `<execute_implementation>`. No need to restore the inline code blocks.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Instruction Ordering	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better

📄 `instructions/r3/core/workflows/aqa-flow-test-correction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-test-implementation.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Output Contract	Problem: BASE shipped a full worked TypeScript test example (imports, describe block, setup, actions, explicit assertions, cleanup) plus a `## Phase 6: Test Implementation` test-plan output template. NEW deletes all of it and keeps only a state-file update example; the actual test-code shape and test-plan section are deferred to `aqa-test-authoring`'s `<output_format>`. The `<skill_handoff>` contract is strong, but no in-phase test example remains. Reason: The deleted worked example was the concrete grounding for what a passing test looks like; correctness now depends entirely on the bound `aqa-test-authoring` skill, so the ownership pointer must stay accurate. Solution: Deferral is acceptable because `aqa-test-authoring` is explicitly named as owner of authoring decisions and `<output_format>`; confirm that skill carries the test-code/test-plan example and keep the existing pointer. No restoration of inline code needed.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-test-report-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	✅ Much better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/qa-flow-api-spec-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/workflows/qa-flow-data-collection.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/workflows/qa-flow-documentation-mcp-subflow.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/workflows/qa-flow-execution-and-report-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/workflows/qa-flow-gap-and-requirements-clarification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/workflows/qa-flow-project-config-loading.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/workflows/qa-flow-test-case-specification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/workflows/qa-flow.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/workflows/qa-flow-test-correction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/workflows/qa-flow-test-implementation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/workflows/testgen-flow-data-collection.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	5	✅ Much better
Failure Handling	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow-gap-and-contradiction-analysis.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Output Contract	Problem: The base inlined the full entry schemas for contradictions (C1: Type/Source1/Source2/Impact/NeedsClarification), gaps (G1: Type/Context/Missing/Impact/SuggestedQuestion), ambiguities (A1), and the whole analysis.md skeleton (sections 1-6 with formats). The new file delegates sections 1-6 to the `gap-and-contradiction-analysis` skill and only keeps the appended sections 7-8 plus a single vague-vs-specific example. The phase no longer states the per-entry field shapes. Reason: The schema moved to the skill rather than being lost, but the phase now depends entirely on the skill for the entry contract, which is a single point of failure if the skill drifts. Solution: Acceptable as progressive disclosure provided the `gap-and-contradiction-analysis` skill defines the C/G/A entry shapes and the sections 1-6 skeleton in its `<output_format>`. Verify the skill carries them; if not, the phase has dropped the only place the entry schema was specified.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/testgen-flow-project-config-loading.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/testgen-flow-question-generation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/testgen-flow-requirements-document-generation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/testgen-flow-test-case-export.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow-test-case-generation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Failure Handling	Problem: The BASE router carried an explicit `## Error Handling` block at the router level (Jira ticket not found -> verify key; no Confluence results -> proceed Jira-only or ask; user doesn't answer questions -> remind, cannot proceed to Phase 4). The NEW router removes this block. The router still names HITL gates and the verification-failure resume, but per-phase failure cases (e.g. ticket-not-found, no-Confluence-results) are no longer surfaced at the routing layer. Reason: Failure cases that exist in neither the router nor the phase file would be silently lost during the restructure. Verification showed the deleted detail largely relocated into phase reference files, so this is a relocation risk, not a confirmed loss. Solution: Confirm each removed failure case is owned by its ACQUIRE'd phase file (data-collection, question-generation). If a case has no home in a phase file, add a one-line router-level pointer or restore it. Do not re-inline full detail; a cross-reference is sufficient for a router.
⚪ Low	Example Grounding	Problem: The BASE router included concrete grounding examples (initial-prompt formats, a sample Confluence CQL search string, contradiction/gap type catalogs). The NEW router drops these from the router body. The router now relies on phase files for examples. Reason: For a top router, moving examples into ACQUIRE'd phase files is the intended progressive-disclosure pattern and reduces router bloat, so the absolute capability is preserved as long as the phase files carry the examples. Solution: Verify the CQL search example and prompt-format examples are present in `testgen-flow-data-collection.md` and `testgen-flow-project-config-loading.md`. Keep them out of the router (correct for progressive disclosure); only act if a phase file lacks its example.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Safety Boundaries	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better

…s locally

github-actions · 2026-06-03T09:45:45Z

📋 Prompt Quality Validation Report

❌ Validation Failed

Summary by File

File	🟠 Very High	🟡 High	🔵 Medium	⚪ Low	Status
`instructions/r2/core/skills/api-test-spec-authoring/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/api-test-spec-authoring/references/templates-and-redaction.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/aqa-codebase-analysis/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/aqa-codebase-analysis/references/report-template.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/aqa-requirements-elicitation/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/aqa-selector-management/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/aqa-selector-management/references/strategy-and-template.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/aqa-test-authoring/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/aqa-test-authoring/references/test-implementation-template.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/aqa-test-debugging/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/aqa-test-debugging/references/escalation-template.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/aqa-test-debugging/references/part-b-mechanics.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/automation-test-execution-analysis/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/automation-test-execution-analysis/references/redaction-policy.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/automation-test-implementation-handoff/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/automation-test-implementation-handoff/references/templates.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-best-practices.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-extract.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-rosetta.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/coding/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/confluence-source-harvesting/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/confluence-source-harvesting/references/redaction-and-normalization.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/gap-and-contradiction-analysis/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/gap-and-contradiction-analysis/references/entry-templates-and-document-skeleton.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/gitnexus-cli/SKILL.md`	0	0	1	1	⚠️ Warning
`instructions/r2/core/skills/gitnexus-setup/SKILL.md`	0	0	0	1	⚠️ Warning
`instructions/r2/core/skills/gitnexus-tools/SKILL.md`	1	0	0	0	❌ Fail
`instructions/r2/core/skills/gitnexus-tools/assets/gn-examples.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/init-workspace-documentation/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/init-workspace-rules/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/init-workspace-verification/SKILL.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/skills/load-context-instructions/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/load-context/SKILL.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/skills/load-workflow/SKILL.md`	0	0	0	2	⚠️ Warning
`instructions/r2/core/skills/mcp-confluence-data-collection/SKILL.md`	0	2	0	0	❌ Fail
`instructions/r2/core/skills/mcp-confluence-data-collection/references/cql-and-redaction.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/mcp-confluence-data-collection/references/vendor-swap.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/mcp-jira-data-collection/SKILL.md`	0	2	1	0	❌ Fail
`instructions/r2/core/skills/mcp-jira-data-collection/references/redaction.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/mcp-jira-data-collection/references/vendor-swap.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/skills/mcp-testrail-data-collection/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/mcp-testrail-data-collection/references/redaction.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/mcp-testrail-data-collection/references/vendor-swap.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/operation-manager/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/operation-manager/assets/om-schema.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/orchestrator-contract/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/qa-data-collection/SKILL.md`	0	1	0	0	❌ Fail
`instructions/r2/core/skills/qa-data-collection/references/backend-source-analysis.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/qa-data-collection/references/existing-test-patterns.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/qa-data-collection/references/output-template.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/qa-gap-analysis/SKILL.md`	0	1	0	0	❌ Fail
`instructions/r2/core/skills/qa-project-config/SKILL.md`	0	1	0	0	❌ Fail
`instructions/r2/core/skills/qa-test-debugging/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/qa-test-debugging/references/failure-catalog.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/qa-test-debugging/references/part-b-mechanics.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/qa-test-implementation/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/qa-test-implementation/references/multi-language-examples.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/repository-implementation-standards/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/requirements-synthesis/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/requirements-synthesis/references/output-schemas.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/sequential-workflow-execution/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/swagger-contracts-analysis/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/swagger-contracts-analysis/references/canonical-example.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/swagger-contracts-analysis/references/failure-handling-edge-cases.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/swagger-contracts-analysis/references/redaction-catalog.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/testrail-test-case-authoring/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/testrail-test-case-authoring/references/examples-and-redaction.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/testrail-test-case-export/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/testrail-test-case-export/references/vendor-porting.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/user-approved-code-changes/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/adhoc-flow.md`	0	1	0	0	❌ Fail
`instructions/r2/core/workflows/aqa-flow-code-analysis.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow-data-collection.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow-requirements-clarification.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow-selector-identification.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow-selector-implementation.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow-test-correction.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow-test-implementation.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow-test-report-analysis.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow.md`	0	1	0	0	❌ Fail
`instructions/r2/core/workflows/coding-agents-prompting-flow.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/coding-flow.md`	1	1	1	1	❌ Fail
`instructions/r2/core/workflows/external-lib-flow.md`	0	0	0	1	⚠️ Warning
`instructions/r2/core/workflows/init-workspace-flow-context.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/init-workspace-flow-discovery.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/init-workspace-flow-questions.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/init-workspace-flow-rules.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/init-workspace-flow-shells.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/init-workspace-flow.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/modernization-flow.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-api-spec-analysis.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-data-collection.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-documentation-mcp-subflow.md`	0	0	1	1	⚠️ Warning
`instructions/r2/core/workflows/qa-flow-execution-and-report-analysis.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-gap-and-requirements-clarification.md`	0	0	0	1	⚠️ Warning
`instructions/r2/core/workflows/qa-flow-project-config-loading.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-test-case-specification.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-test-correction.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-test-implementation.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/research-flow.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/self-help-flow.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/testgen-flow-project-config-loading.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/testgen-flow-data-collection.md`	0	0	0	1	⚠️ Warning
`instructions/r2/core/workflows/testgen-flow-gap-and-contradiction-analysis.md`	0	0	0	1	⚠️ Warning
`instructions/r2/core/workflows/testgen-flow-question-generation.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/testgen-flow-requirements-document-generation.md`	0	0	0	1	⚠️ Warning
`instructions/r2/core/workflows/testgen-flow-test-case-export.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/testgen-flow-test-case-generation.md`	0	0	0	1	⚠️ Warning
`instructions/r2/core/workflows/testgen-flow.md`	0	1	0	2	❌ Fail
`instructions/r3/core/skills/api-test-spec-authoring/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/api-test-spec-authoring/references/templates-and-redaction.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/aqa-codebase-analysis/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/aqa-codebase-analysis/references/report-template.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/aqa-requirements-elicitation/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/aqa-selector-management/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/aqa-selector-management/references/strategy-and-template.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/aqa-test-authoring/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/aqa-test-authoring/references/test-implementation-template.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/aqa-test-debugging/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/aqa-test-debugging/references/escalation-template.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/aqa-test-debugging/references/part-b-mechanics.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/automation-test-execution-analysis/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/automation-test-execution-analysis/references/redaction-policy.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/automation-test-implementation-handoff/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/automation-test-implementation-handoff/references/templates.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/confluence-source-harvesting/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/confluence-source-harvesting/references/redaction-and-normalization.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/gap-and-contradiction-analysis/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/gap-and-contradiction-analysis/references/entry-templates-and-document-skeleton.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/mcp-confluence-data-collection/SKILL.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/skills/mcp-confluence-data-collection/references/cql-and-redaction.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/mcp-confluence-data-collection/references/vendor-swap.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/mcp-jira-data-collection/SKILL.md`	0	0	2	0	⚠️ Warning
`instructions/r3/core/skills/mcp-jira-data-collection/references/redaction.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/mcp-jira-data-collection/references/vendor-swap.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/skills/mcp-testrail-data-collection/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/mcp-testrail-data-collection/references/redaction.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/mcp-testrail-data-collection/references/vendor-swap.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/qa-data-collection/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/qa-data-collection/references/backend-source-analysis.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/qa-data-collection/references/existing-test-patterns.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/qa-data-collection/references/output-template.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/qa-gap-analysis/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/qa-project-config/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/qa-test-debugging/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/qa-test-debugging/references/failure-catalog.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/qa-test-debugging/references/part-b-mechanics.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/qa-test-implementation/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/qa-test-implementation/references/multi-language-examples.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/repository-implementation-standards/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/requirements-synthesis/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/requirements-synthesis/references/output-schemas.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/sequential-workflow-execution/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/swagger-contracts-analysis/SKILL.md`	0	1	0	0	❌ Fail
`instructions/r3/core/skills/swagger-contracts-analysis/references/canonical-example.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/swagger-contracts-analysis/references/failure-handling-edge-cases.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/swagger-contracts-analysis/references/redaction-catalog.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/testrail-test-case-authoring/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/testrail-test-case-authoring/references/examples-and-redaction.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/testrail-test-case-export/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/testrail-test-case-export/references/vendor-porting.md`	0	0	2	3	⚠️ Warning
`instructions/r3/core/skills/user-approved-code-changes/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/adhoc-flow.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-code-analysis.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-data-collection.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-requirements-clarification.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-selector-identification.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-selector-implementation.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-test-correction.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-test-implementation.md`	0	0	1	1	⚠️ Warning
`instructions/r3/core/workflows/aqa-flow-test-report-analysis.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/qa-flow-api-spec-analysis.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-data-collection.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-documentation-mcp-subflow.md`	0	0	1	1	⚠️ Warning
`instructions/r3/core/workflows/qa-flow-execution-and-report-analysis.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-gap-and-requirements-clarification.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-project-config-loading.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-test-case-specification.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-test-correction.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-test-implementation.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-data-collection.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/testgen-flow-gap-and-contradiction-analysis.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/testgen-flow-project-config-loading.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/testgen-flow-question-generation.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-requirements-document-generation.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-test-case-export.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-test-case-generation.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow.md`	0	0	1	0	⚠️ Warning

📄 `instructions/r2/core/skills/api-test-spec-authoring/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/api-test-spec-authoring/references/templates-and-redaction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/aqa-codebase-analysis/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/aqa-codebase-analysis/references/report-template.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/aqa-requirements-elicitation/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/aqa-selector-management/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/aqa-selector-management/references/strategy-and-template.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/aqa-test-authoring/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/aqa-test-authoring/references/test-implementation-template.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Output Contract	5	✅ Much better
Conflict Resolution	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Failure Handling	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/aqa-test-debugging/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/aqa-test-debugging/references/escalation-template.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Output Contract	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/aqa-test-debugging/references/part-b-mechanics.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/automation-test-execution-analysis/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/automation-test-execution-analysis/references/redaction-policy.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/automation-test-implementation-handoff/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/skills/automation-test-implementation-handoff/references/templates.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-best-practices.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Reference Integrity	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-extract.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Reference Integrity	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Reference Integrity	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-rosetta.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Reference Integrity	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/coding/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/skills/confluence-source-harvesting/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/skills/confluence-source-harvesting/references/redaction-and-normalization.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/skills/gap-and-contradiction-analysis/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/skills/gap-and-contradiction-analysis/references/entry-templates-and-document-skeleton.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/skills/gitnexus-cli/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Safety Boundaries	Problem: Destructive commands clean (deletes .gitnexus/ and unregisters repo) and clean --all (deletes ALL indexed repos) are documented with no guardrail; the --force flag to skip confirmation is presented neutrally. Reason: An agent could irreversibly delete indexes across all repos when only the current repo was intended. Solution: Add a boundary note that clean --all and clean --force are destructive and must not be auto-run without explicit user approval.
⚪ Low	Failure Handling	Problem: The troubleshooting block covers three known symptoms but the skill defines no behavior for command failure in general (non-zero exit, missing API key for wiki, network failure during analyze). Reason: Silent command failure would let downstream steps run against a stale or absent index. Solution: Add a brief failure-handling rule: on non-zero exit, surface stderr to the user and stop; do not silently proceed as if indexing succeeded.

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/skills/gitnexus-setup/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
⚪ Low	Safety Boundaries	Problem: The install steps run npx gitnexus analyze and npx gitnexus setup, the latter writing global MCP config that auto-detects and modifies editor configuration, with no boundary requiring user awareness that a global, machine-wide config write occurs. Reason: A global config write is a system-level side effect; flagging it prevents unexpected machine-wide changes during init. Solution: Add a boundary note that setup writes global editor/MCP configuration and should run only with the user's awareness, consistent with the when-to-use opt-in gate.

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/skills/gitnexus-tools/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟠 Very High	Reference Integrity	Problem: The new block references gitnexus-usage/assets/gn-examples.md, but the asset actually lives at gitnexus-tools/assets/gn-examples.md. No gitnexus-usage skill folder exists in r2. Reason: When the agent runs ACQUIRE on the wrong path the examples will not load, so the worked examples this skill points to are unreachable. Solution: Change the reference in the block to gitnexus-tools/assets/gn-examples.md to match the real asset path inside this skill.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	✅ Much better
Output Contract	4	✅ Much better
Success Criteria	4	✅ Much better
Conflict Resolution	4	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	✅ Much better
Workflow Completeness	4	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	2	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	✅ Much better
Failure Handling	4	✅ Much better
Epistemic Honesty	4	✅ Much better
Self-Validation	4	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	4	✅ Much better
Dependency Management	4	✅ Much better

📄 `instructions/r2/core/skills/gitnexus-tools/assets/gn-examples.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	✅ Much better
Output Contract	4	✅ Much better
Success Criteria	4	✅ Much better
Conflict Resolution	4	✅ Much better
Decision Branching	4	✅ Much better
Instruction Ordering	4	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	✅ Much better
Failure Handling	4	✅ Much better
Epistemic Honesty	4	✅ Much better
Self-Validation	4	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	✅ Much better

📄 `instructions/r2/core/skills/init-workspace-documentation/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Output Contract	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/init-workspace-rules/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Precision & Explicitness	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/init-workspace-verification/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Workflow Completeness	Problem: The PR deletes the DEPRECATED ARTIFACTS block that told the agent to notify the user (without auto-delete) about r1 leftovers agents/init-rosetta-shells-flow-state.md and local init-rosetta-shells-flow.md. The verification phase no longer surfaces these stale artifacts during an upgrade. Reason: Removing the only notification step means upgrades that still carry these r1 files leave stale artifacts behind silently, a small loss of an upgrade safety check. Solution: If these r1 artifacts are no longer reachable in supported upgrade paths, the deletion is fine; otherwise restore a short deprecated-artifacts notification step so upgrades from r1 still flag the leftover state/flow files for the user.

📊 Gates Comparison

Gate	Score	Comparison
Workflow Completeness	3	⬇️ Slightly worse
Bloat Control	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/load-context-instructions/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	✅ Much better
Output Contract	4	✅ Much better
Success Criteria	4	✅ Much better
Conflict Resolution	4	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	4	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	✅ Much better
Safety Boundaries	4	✅ Much better
Failure Handling	4	✅ Much better
Epistemic Honesty	4	✅ Much better
Self-Validation	4	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/load-context/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Example Grounding	Problem: The added bash example uses grep -n "^#{1,3}" .... Plain grep is BRE and treats {1,3} literally, so this command matches nothing for markdown headers. The mitigation in step 3 (use built-in tools if available) does not fix the literal example the agent is shown. Reason: An agent that copies the shown command verbatim gets empty output and silently loses the IMPLEMENTATION/MEMORY/PATTERNS/REQUIREMENTS header context this step is meant to gather. Solution: Use grep -nE "^#{1,3} " (extended regex, with a trailing space) or grep -nE "^#+ " so the header-extraction example actually returns the intended ToC lines.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	2	⬇️ Slightly worse
Failure Handling	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better

📄 `instructions/r2/core/skills/load-workflow/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
⚪ Low	Failure Handling	Problem: No handling for the case where no workflow matches the request or the ACQUIRE fails. Reason: An unmatched request currently has no defined path, risking a stall. Solution: Add a failure branch: if no workflow matches, fall back to the ad-hoc/lightweight workflow or ask the user.
⚪ Low	Decision Branching	Problem: Step 2 (resume) and step 3 (auto vs No HITL) name branch conditions but give no explicit else/handling when the state file is missing or the mode is ambiguous. Reason: Variable resume/mode scenarios without an else can leave the agent stalled or silently picking a mode. Solution: Add the else branch: if no state file exists on a resume request, state that and start fresh; define default when mode is unclear.

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/skills/mcp-confluence-data-collection/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Cognitive Budget	Problem: The always-loaded SKILL.md packs the full success_criteria, an 8-step process, output template, 5 safety rules, 8 failure cases, 9 validation items, and 8 pitfalls into one entry file (~14K chars), exceeding the ~5-step reliable-handling guidance the spec cites. Reason: A single oversized entry file increases the odds the agent drops steps; progressive disclosure of the checklist would cut active load without losing the gate. Solution: Move the validation_checklist (largely a mirror of success_criteria + failure_handling) to a reference loaded only at pre-emit time, leaving the entry file focused on process + safety + failure.
🟡 High	Bloat Control	Problem: New 13.9K-char SKILL.md heavily restates the same rules across <success_criteria>, , <safety_boundaries>, <failure_handling>, <validation_checklist>, and (e.g. 'permission errors are not empty content', truncation-at-5000-words, redact-before-writing each appear 3-4 times). Reason: The 10K-20K size band is a high-severity reliability concern; repeated prose inflates the always-loaded context and raises the chance the agent skips list items. Solution: Keep each rule in its primary section (failure_handling for error paths, safety_boundaries for redaction) and have validation_checklist/pitfalls reference rather than re-state the full prose.

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/skills/mcp-confluence-data-collection/references/cql-and-redaction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/skills/mcp-confluence-data-collection/references/vendor-swap.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/skills/mcp-jira-data-collection/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Cognitive Budget	Problem: The always-loaded entry file carries success_criteria + 5-step process + full output template + 5 safety rules + 7 failure cases + 8 validation items + 7 pitfalls (~12K chars), well beyond the ~5-step reliable-handling guidance. Reason: An oversized single entry file raises the chance the agent drops steps; the checklist is the most duplicative block and is the natural candidate for progressive disclosure. Solution: Move the validation_checklist (a near-duplicate of success_criteria + failure_handling) into a reference loaded only at the pre-emit step, trimming the entry file.
🟡 High	Bloat Control	Problem: New 12.4K-char SKILL.md restates the same guarantees across <success_criteria>, step 3, <safety_boundaries>, <failure_handling>, <validation_checklist>, and (restricted-not-empty, redact-before-writing, comments-cap-at-10, no-fabrication each repeated 3-4 times). Reason: 10K-20K size band is a high-severity reliability concern; duplicated prose inflates always-loaded context and increases skipped-item risk. Solution: Keep each rule in one home section and have validation_checklist/pitfalls reference instead of duplicating the full prose.
🔵 Medium	Workflow Completeness	Problem: New is numbered 1-5 and places jira_search_fields in step 3, but the new sibling references/vendor-swap.md (line 13) calls it 'step 6 fallback' — there is no step 6, so the step numbering referenced across the skill family is inconsistent. Reason: Mismatched step references in the same skill family make maintainer edits error-prone and signal the numbering was renumbered without updating refs. Solution: Fix the vendor-swap.md cross-reference to say 'step 3 + pitfalls' to match the actual SKILL.md step numbering.

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/skills/mcp-jira-data-collection/references/redaction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/skills/mcp-jira-data-collection/references/vendor-swap.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Reference Integrity	Problem: Line 13 refers to 'jira_search_fields (step 6 fallback + pitfalls)' but the sibling SKILL.md only has steps 1-5 and places jira_search_fields in step 3 — the cited step number does not resolve. Reason: A dangling step reference inside the same skill family misleads maintainers porting the skill and indicates the SKILL.md was renumbered without updating this guide. Solution: Change 'step 6 fallback + pitfalls' to 'step 3 + pitfalls' to match the actual SKILL.md numbering.

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/skills/mcp-testrail-data-collection/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Decision Branching	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better

📄 `instructions/r2/core/skills/mcp-testrail-data-collection/references/redaction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Output Contract	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Cognitive Budget	5	✅ Much better

📄 `instructions/r2/core/skills/mcp-testrail-data-collection/references/vendor-swap.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/operation-manager/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Decision Branching	5	✅ Much better
Workflow Completeness	5	✅ Much better
Structural Coherence	5	✅ Much better
Self-Validation	5	✅ Much better

📄 `instructions/r2/core/skills/operation-manager/assets/om-schema.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Output Contract	5	✅ Much better
Instruction Ordering	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/orchestrator-contract/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/qa-data-collection/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Cognitive Budget	Problem: The newly added SKILL.md is ~14.2K chars in one file. Although step 4/5/7 enumerations were correctly extracted to references/, the SKILL.md body still carries the full <safety_boundaries>, <success_criteria>, <failure_handling>, and <validation_checklist> blocks, which overlap heavily (the secret-scan rule is restated across pitfalls, safety, step 6.1, success criteria, and validation checklist). Reason: A 14K single-load skill body with the same rule repeated in five sections raises cognitive load and dilutes the one-term-per-concept principle, making it harder for the agent to reliably act on the canonical version. Solution: Keep the single authoritative statement of the secret-scan and anti-assumption rules in <safety_boundaries> and reference them by tag from the other blocks instead of restating the full procedure, reducing the in-context body below the ~10K reliable-load band.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/qa-data-collection/references/backend-source-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/qa-data-collection/references/existing-test-patterns.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/qa-data-collection/references/output-template.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/qa-gap-analysis/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Cognitive Budget	Problem: The new SKILL.md is ~15K chars and is fully inline (no references/ extraction unlike its sibling qa-data-collection). The <validation_checklist> 'Question count <= 20 per batch' note (lines 270) and the Executive Summary 'Questions Asked' note (line 182) spend a large prose budget reconciling two different question-count denominators (Critical+Important vs Critical+Important+Optional), restating the same distinction three times. Reason: A 15K single-load body that triple-explains the same count-denominator nuance increases cognitive load and risks the agent applying the wrong denominator; consolidating to one canonical definition improves reliability. Solution: State the two count definitions once in <output_format> (Executive Summary) and reference that single definition from <validation_checklist> and <success_criteria> rather than re-explaining the deliberate denominator difference in each block; consider extracting the gap/contradiction/ambiguity entry templates to a references file as qa-data-collection does.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/qa-project-config/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Cognitive Budget	Problem: The new SKILL.md is ~14.9K chars fully inline. The 'Redaction at intake' rule is the canonical source in <safety_boundaries> but is then restated by pointer in <failure_handling> (line 215), (line 227), and <validation_checklist> (line 243); combined with the two embedded markdown templates (state-file stub + project-config template) the single-load body sits well above the ~10K reliable band. Reason: A ~15K skill body that re-points to the same redaction rule from four blocks raises cognitive load; consolidating to one canonical location keeps the agent reliably acting on a single version of the safety contract. Solution: Keep the project-config markdown template only in step 5 and the redaction rule only in <safety_boundaries>, and let the other blocks reference by tag without re-describing; this trims the single-load body toward the reliable range without losing any rule.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/qa-test-debugging/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	✅ Much better
Input Contract	4	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	✅ Much better
Cognitive Budget	4	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/qa-test-debugging/references/failure-catalog.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	4	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	4	✅ Much better
Instruction Ordering	4	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	4	✅ Much better
Epistemic Honesty	4	✅ Much better
Self-Validation	4	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/qa-test-debugging/references/part-b-mechanics.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/qa-test-implementation/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	✅ Much better
Cognitive Budget	4	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/qa-test-implementation/references/multi-language-examples.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	✅ Much better
Output Contract	4	✅ Much better
Success Criteria	4	✅ Much better
Conflict Resolution	4	✅ Much better
Decision Branching	4	✅ Much better
Instruction Ordering	4	✅ Much better
Workflow Completeness	4	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	✅ Much better
Failure Handling	4	✅ Much better
Epistemic Honesty	4	✅ Much better
Self-Validation	4	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/repository-implementation-standards/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/requirements-synthesis/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/requirements-synthesis/references/output-schemas.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/sequential-workflow-execution/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/swagger-contracts-analysis/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/swagger-contracts-analysis/references/canonical-example.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/swagger-contracts-analysis/references/failure-handling-edge-cases.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/swagger-contracts-analysis/references/redaction-catalog.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/testrail-test-case-authoring/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/testrail-test-case-authoring/references/examples-and-redaction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/testrail-test-case-export/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/testrail-test-case-export/references/vendor-porting.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/user-approved-code-changes/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/adhoc-flow.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Reference Integrity	Problem: Line 30 newly adds the OPERATION_MANAGER recovery path: 'MUST FALLBACK to built-in todo task tools ACQUIRE todo-tasks-fallback.md FROM KB'. That file exists only in r3 (instructions/r3/core/rules/todo-tasks-fallback.md); there is no todo-tasks-fallback.md anywhere in r2. Per release isolation (one agent works with one release, no cross-refs), an r2 agent that reaches this path ACQUIREs a document that resolves to zero results. Reason: Simulated r2 agent whose rosettify MCP and npx CLI both fail runs ACQUIRE on a non-existent doc, gets nothing back, and is left with no execution-tracking mechanism on its only recovery route. r3 agents are unaffected. Solution: Add todo-tasks-fallback.md to the r2 KB (mirroring r3), or change the r2 line to describe the built-in todo-task fallback inline instead of pointing at a missing file.

📊 Gates Comparison

Gate	Score	Comparison
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	3	⬇️ Slightly worse
Structural Coherence	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow-code-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow-data-collection.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow-requirements-clarification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow-selector-identification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow-selector-implementation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow-test-correction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow-test-implementation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better

📄 `instructions/r2/core/workflows/aqa-flow-test-report-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	✅ Much better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Safety Boundaries	Problem: The PR adds a new <orchestration_and_escalation> 'Verification-failure unilateral-start override' (lines 116-125, not present in origin/main) that authorizes the agent to start the earliest incomplete phase in the SAME turn without asking the user ('do NOT call AskUserQuestion'). This is a new sanctioned no-ask deviation from session-wide hitl defaults. The auditors flagged the identical override on the sibling testgen-flow.md as Safety Boundaries sev 3 but marked this r2 aqa-flow.md clean, an inconsistency. Reason: Any new sanctioned no-ask path is a HITL relaxation that deserves reviewer attention; the override is well scope-locked (3 preconditions, 'Ambiguity defaults to ASK', explicit carve-outs) so severity is moderate, not critical. Solution: Apply the same Safety Boundaries flag as testgen-flow.md: keep the carve-outs but make the override auditable (log the override decision into agents/aqa-state.md, not just print one line) and re-confirm it is clearly subordinate to bootstrap-hitl-questioning policy.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	3	⬇️ Slightly worse
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/coding-agents-prompting-flow.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Workflow Completeness	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/coding-flow.md`

⚠️ Issues Found

Severity	Gate	Details
🟠 Very High	Decision Branching	Problem: Phase renumber from 11 to 13 phases broke the SMALL-mode HITL combine target. New <user_review_impl phase="10"> line 126 says 'SMALL: combined with Phase 12 checkpoint', but in NEW Phase 12 is <review_tests>, a non-HITL subagent reviewer phase. In BASE the same impl-approval gate (then phase 8) combined with Phase 4, which was a HITL user-review gate. A user-approval gate now points at a subagent review phase that never presents to the user, and there is no later HITL phase to fold into. Reason: Simulated SMALL-mode agent at phase 10 reads 'combine with Phase 12', treats the impl user-approval as already covered by the test-review phase, and proceeds to write/run tests without the explicit 'Do NOT proceed to tests until explicit approval' user gate firing. The impl HITL approval can be silently skipped in SMALL runs. Solution: Repoint the SMALL combine target of phase 10 to an actual type="HITL" gate. Since no HITL phase follows phase 10, either keep phase 10 as a standalone HITL checkpoint in SMALL mode (remove the combine note) or pair it with phase 6 like the design gate does. The combine target MUST be a HITL phase.
🟡 High	Reference Integrity	Problem: In the new <user_review_impl phase="10"> the cross-reference was renumbered to 'SMALL: combined with Phase 12 checkpoint' (line 126), but Phase 12 is now <review_tests>, a non-HITL subagent reviewer phase. The BASE pointed to 'Phase 4', which was a HITL user-review checkpoint. A HITL user approval gate cannot logically be merged into a subagent code-review phase. Reason: The phase renumber (old 8-phase -> new 13-phase) updated the number but not the semantics; pointing a user-approval combine at a subagent review breaks the SMALL-mode checkpoint collapsing logic. Solution: Repoint the SMALL combination target to a HITL checkpoint phase (e.g. the final HITL user review, or remove the combine note if no later HITL gate exists). The combine target must be a type="HITL" phase.
🔵 Medium	Workflow Completeness	Problem: The new <user_review_design phase="3"> block lists all four of its sub-steps with the marker '1.' (lines 55-58) instead of 1,2,3,4. Step ordering/dependency within this newly added HITL phase is not numbered. Reason: These lines are newly added in this diff; repeated '1.' loses the intended ordering of the approval-gate sub-steps. Solution: Renumber the four sub-steps 1-4 so the present-solution -> present-specs -> do-not-assume-approval -> SMALL-combine sequence is explicit.
⚪ Low	Structural Coherence	Problem: Same <user_review_design> block (lines 55-58) breaks atomic-step numbering by repeating '1.' four times, reducing scannability of the new HITL gate. Reason: Introduced by this diff; non-sequential numbering in a HITL gate degrades structural clarity of the changed section. Solution: Apply sequential numbering 1-4 to the added sub-steps.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Decision Branching	2	⬇️ Slightly worse
Workflow Completeness	3	⬇️ Slightly worse
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	3	⬇️ Slightly worse
Structural Coherence	3	⬇️ Slightly worse
Safety Boundaries	5	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/external-lib-flow.md`

⚠️ Issues Found

Severity	Gate	Details
⚪ Low	Workflow Completeness	Problem: The newly added 'Phase 0: Prerequsites' block numbers its two items '1.' then '3.' (skips 2), so the prerequisite list has a broken sequence. Reason: These lines are added in this diff; the 1 then 3 numbering is a defect in the added content that misrepresents step count. Solution: Renumber the two prerequisite items as 1 and 2.

📊 Gates Comparison

Gate	Score	Comparison
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	3	⬇️ Slightly worse
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/init-workspace-flow-context.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/workflows/init-workspace-flow-discovery.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/workflows/init-workspace-flow-questions.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Input Contract	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/init-workspace-flow-rules.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/workflows/init-workspace-flow-shells.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/workflows/init-workspace-flow.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Workflow Completeness	Problem: The new verification phase (now phase 9) deleted step 'Notify user: delete init-rosetta-shells-flow.md'. The old base had this cleanup step; the new version drops it, so a stale shell file may be left behind after init completes. Reason: Losing a cleanup instruction can leave stale config files that mislead later sessions. Solution: If the deletion is intentional because the file is no longer generated, leave as-is; otherwise restore a cleanup/notify step in the verification phase for any obsolete shell file. Confirm intent before final merge.

📊 Gates Comparison

Gate	Score	Comparison
Success Criteria	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/modernization-flow.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/qa-flow-api-spec-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/workflows/qa-flow-data-collection.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/workflows/qa-flow-documentation-mcp-subflow.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Bloat Control	Problem: Line 28's meta-paragraph explains the file's own authoring rationale ("the literal outcome line is inlined parenthetically ... so the agent does not have to cross-jump", "The canonical table ... still lives in <output_contract>", "Config-key precedence lives in <workflow_context> and is referenced, not relisted"). This is non-operational provenance/design commentary rather than an instruction the agent acts on. Reason: Non-operational design rationale inflates the prompt and competes for attention with the actual branch logic, which the hardening reference flags as AI slop to strip. Solution: Remove the explanatory meta-paragraph at line 28; retain only the operational Early-exit rule. The cross-references are already self-evident from the section names.
⚪ Low	Cognitive Budget	Problem: The new <execute_documentation_mcp> block packs a meta-paragraph, an early-exit rule, three nested sub-blocks (resolve/harvest_and_collect/verify), a verify_remediation block, and an output_contract table into one fragment with heavy cross-jumps (each branch references <output_contract> by name plus an inlined parenthetical copy of the same outcome line). One agent must hold many interacting branch rules at once for a single artifact write. Reason: Duplicated outcome strings in both the table and inline parentheticals double the surface area for one decision and raise drift risk if the table later changes. Solution: Keep the single source of truth (the <output_contract> table) but drop the inlined parenthetical duplicate outcome lines at each trigger site introduced in /<harvest_and_collect>; let the table be the only place outcome strings live, reducing the parallel rule-set the agent juggles.

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/workflows/qa-flow-execution-and-report-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/workflows/qa-flow-gap-and-requirements-clarification.md`

⚠️ Issues Found

Severity	Gate	Details
⚪ Low	Single Responsibility	Problem: <execute_gap_analysis> chains three analysis skills in sequence (lines 30-32): qa-gap-analysis, gap-and-contradiction-analysis, and aqa-requirements-elicitation. The first two have overlapping responsibility (gaps vs contradictions) and the file does not state how their outputs combine or whether one supersedes the other. Reason: Two similarly-scoped skills run back-to-back without a stated division of labor can produce duplicated or contradictory entries in the gaps/contradictions sections. Solution: Add a one-line note stating each skill's distinct contribution to analysis.md (e.g. gaps vs contradictions vs elicited requirements) so the agent does not double-count or produce conflicting sections.

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/workflows/qa-flow-project-config-loading.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/workflows/qa-flow-test-case-specification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/workflows/qa-flow-test-correction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/workflows/qa-flow-test-implementation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/workflows/qa-flow.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/workflows/research-flow.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/self-help-flow.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Structural Coherence	Problem: The new <prerequisites phase="0", applies="ALL"> block is opened twice and never closed. The added closing line is <prerequisites phase="0", applies="ALL"> (a second opening tag) instead of . The block bleeds into the following <list_capabilities phase="1" ...> section with no clear boundary. Reason: An unclosed/duplicated XML-style tag makes the section boundary ambiguous; an agent parsing the workflow can mis-attribute Phase-0 prerequisite rules to Phase 1 or lose the boundary entirely. The sibling research-flow.md edit closed the block correctly, so this is an isolated typo in this diff. Solution: Change the second added <prerequisites phase="0", applies="ALL"> line to a closing tag , matching the pattern used correctly in research-flow.md.

📊 Gates Comparison

Gate	Score	Comparison
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Structural Coherence	3	⬇️ Slightly worse

📄 `instructions/r2/core/workflows/testgen-flow-project-config-loading.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Conflict Resolution	Problem: The rewritten <update_state step="0.6"> keeps the user-facing question "Ready to proceed to Phase 1 (Data Collection)?" but the base file's explicit step "3. Wait for confirmation" before loading Phase 1 was deleted and not replaced with a STOP/WAIT directive. Reason: Asking a question without an explicit wait instruction lets an agent ask and immediately proceed, weakening the phase-transition HITL gate that the deleted line provided. The parent testgen-flow may still gate the transition, but the per-phase explicitness was reduced by this diff. Solution: Add an explicit STOP-and-wait directive after the Phase-1 readiness question in step 0.6 (e.g. "STOP and wait for user confirmation before the parent flow advances to Phase 1"), matching the deterministic gate wording used elsewhere in the qa/testgen flows.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/testgen-flow-data-collection.md`

⚠️ Issues Found

Severity	Gate	Details
⚪ Low	Output Contract	Problem: The base file specified concrete Jira retrieval (exact fields= list and comment_limit=10) and detailed Confluence capture instructions (CQL query template, parent/child traversal limits, per-page capture fields) inline. The diff deletes these and delegates to mcp-jira-data-collection, mcp-confluence-data-collection, and confluence-source-harvesting. The phase's own retrieval output contract is now thinner and depends entirely on those skills defining the field set and capture shape. Reason: Verified the three delegated skills exist in r2, so Reference Integrity holds; the residual risk is that the phase no longer states what raw-data.md must contain, so a skill change could silently degrade the artifact without the phase catching it. Low severity because the validation_checklist still requires key Jira fields. Solution: Keep the delegation (correct per progressive disclosure), but add a one-line minimum-output assertion in <create_raw_data> naming the required captured fields (summary, description, status, priority, labels, components, comments, Confluence page title/url/content) so the phase still asserts its raw-data.md contract independent of skill internals.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/workflows/testgen-flow-gap-and-contradiction-analysis.md`

⚠️ Issues Found

Severity	Gate	Details
⚪ Low	Output Contract	Problem: 328 lines were deleted: the full contradiction/gap/ambiguity taxonomy, per-entry C/G/A field templates, and the analysis.md document skeleton sections 1-6 are removed and delegated to the gap-and-contradiction-analysis skill and its references/entry-templates-and-document-skeleton.md. The phase now only owns the appended sections 7-8 and relies on the skill emitting sections 1-6 with exact numbering for the section-7 append to attach correctly. Reason: Verified the skill and its entry-templates-and-document-skeleton.md exist in r2 and define sections 1-6 ending at section 6, so the append is currently consistent and Reference Integrity holds. The residual risk is a cross-file coupling: the phase's section-7 append silently breaks if the skill's section numbering changes. Low severity because the coupling is currently correct and the diff explicitly documents the canonical home. Solution: Keep the delegation. Optionally add a one-line guard in <create_analysis_document> instructing the agent to verify the skill output ended at a ## 6. section before appending section 7, so a skill drift in section numbering is caught rather than producing a misaligned document.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Input Contract	5	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/workflows/testgen-flow-question-generation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/testgen-flow-requirements-document-generation.md`

⚠️ Issues Found

Severity	Gate	Details
⚪ Low	Example Grounding	Problem: The new version deleted the inline worked examples (US-1 User Login, FR-1 Password Validation, NFR-1 API Response Time with concrete 200ms/95%/1000-user values). These showed the agent exactly what a filled-in entry looks like. The new file replaces per-entry shapes with a pointer to the requirements-synthesis skill's references/output-schemas.md and keeps only an abstract section table, so a concrete positive example no longer survives in this phase file. Reason: Without a concrete filled example the agent may emit vague or under-specified requirement entries, especially when the skill fails to load. Solution: Keep one short concrete example entry (e.g., one filled FR or NFR with real threshold values) inline, or confirm the cited requirements-synthesis output-schemas.md contains equivalent worked examples so grounding is preserved one hop away.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	4	⬇️ Slightly worse
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/testgen-flow-test-case-export.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/workflows/testgen-flow-test-case-generation.md`

⚠️ Issues Found

Severity	Gate	Details
⚪ Low	Cognitive Budget	Problem: The new step 5.2 (identify_test_types) and step 5.3 (generate_test_cases) pack a large amount of content into single steps: full test-type taxonomy plus CRUD/Auth/API coverage patterns in 5.2, and in 5.3 the inline TC schema, the dual-path format constraint, the forbidden-fields list, the good/bad title table, and the merge anti-pattern note. Step 5.3 in particular carries 6+ distinct sub-rules in one directive block, near the ~5-directive reliability ceiling for a single read. Reason: Dense single steps raise the chance the agent drops one sub-rule (e.g., a forbidden field check) when executing under load. Solution: Consider splitting the 5.3 inline template concerns (field schema vs format-prohibitions vs title-quality) into clearly separated sub-blocks or moving the redundant prose into the testrail-test-case-authoring skill, keeping only the self-contained fallback schema inline.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	5	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/workflows/testgen-flow.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Safety Boundaries	Problem: The new <orchestration_and_escalation> block introduces a 'verification-failure unilateral-start override' that authorizes the agent to start a phase in the same turn WITHOUT asking the user, deviating from the per-phase USER CONFIRMATION rule and session-wide hitl defaults. Although it is scope-locked with three preconditions and explicit carve-outs for Phase 3/Phase 6 HITL and destructive actions, it is the only place in the family that sanctions a no-ask deviation from HITL, so it expands the no-ask surface relative to base (which had no such override). Reason: Any sanctioned no-ask path is a HITL relaxation; even tightly scoped, it must be auditable and clearly subordinate to bootstrap HITL policy to avoid an agent over-generalizing the carve-out. Solution: Keep the carve-outs but consider tightening the trigger to also require the agent to log the override decision into testgen-state.md (not just print one line), so the relaxation leaves an auditable trail; and re-confirm this matches bootstrap-hitl-questioning policy precedence.
⚪ Low	Bloat Control	Problem: The <orchestration_and_escalation> and surrounding workflow_phases bullets are heavily prose-dense for a router that is supposed to stay thin (the file itself repeatedly says 'router stays thin'). The single override rule spans ~10 nested bullets with repeated restatements of the same carve-outs (Phase 3, Phase 6, safety) in both <workflow_phases> and <orchestration_and_escalation>. Reason: Repetition of the same carve-outs in two blocks adds reading cost to the top-level router and slightly raises the chance of inconsistent edits later. Solution: Compress the duplicated carve-out lists into one canonical list referenced from both places; reduce restated rationale to a single line.
⚪ Low	Conflict Resolution	Problem: The override rule and the happy-path USER CONFIRMATION rule govern overlapping territory (phase transitions). The new text works hard to disambiguate them ('does NOT generalize', 'Ambiguity defaults to ASK'), but the resolution is spread across <workflow_phases> bullets and the <orchestration_and_escalation> block rather than a single priority hierarchy, leaving the reader to reconcile two competing transition rules. Reason: Two transition rules with cross-references increase the chance an agent applies the no-ask override outside its intended single gate. Solution: State the precedence once as an explicit ordered hierarchy (e.g., 'safety/HITL gates > per-phase USER CONFIRMATION > verification-failure override') in one location and reference it, instead of repeating the carve-out conditions in both blocks.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/api-test-spec-authoring/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/api-test-spec-authoring/references/templates-and-redaction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/aqa-codebase-analysis/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/aqa-codebase-analysis/references/report-template.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/aqa-requirements-elicitation/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/aqa-selector-management/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/aqa-selector-management/references/strategy-and-template.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/aqa-test-authoring/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/aqa-test-authoring/references/test-implementation-template.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/aqa-test-debugging/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/aqa-test-debugging/references/escalation-template.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/aqa-test-debugging/references/part-b-mechanics.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/automation-test-execution-analysis/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/automation-test-execution-analysis/references/redaction-policy.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/automation-test-implementation-handoff/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/skills/automation-test-implementation-handoff/references/templates.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/skills/confluence-source-harvesting/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/skills/confluence-source-harvesting/references/redaction-and-normalization.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/skills/gap-and-contradiction-analysis/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/skills/gap-and-contradiction-analysis/references/entry-templates-and-document-skeleton.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/skills/mcp-confluence-data-collection/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Bloat Control	Problem: The redaction guidance and the cross-domain/permission-error rules are restated three times across <safety_boundaries>, <failure_handling>, <validation_checklist>, and (e.g. "Permission errors are not empty content" appears in step 4, safety_boundaries, failure_handling per-page case, validation_checklist, and pitfalls). The same "redact BEFORE writing" instruction repeats in process step 8, safety_boundaries, validation_checklist, and pitfalls. Reason: Repetition inflates the always-loaded SKILL.md and raises read cost on every invocation without adding new behavior; the rule is already enforced by the validation checklist pointer. Solution: Keep the operational rule once in <safety_boundaries> and have <validation_checklist> and reference it by name rather than re-stating the full rule, reducing the ~14KB file size.

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/mcp-confluence-data-collection/references/cql-and-redaction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/mcp-confluence-data-collection/references/vendor-swap.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/mcp-jira-data-collection/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Bloat Control	Problem: The redaction-before-writing rule and the "permission-restricted is not empty content" rule are each stated 3-4 times across process step 3, safety_boundaries, failure_handling, validation_checklist, and pitfalls (e.g. the assignee/reporter/description restriction rule appears in step 3, safety_boundaries, failure_handling, and pitfalls). Reason: The duplication enlarges the always-loaded ~12KB SKILL.md and raises per-invocation read cost without changing behavior. Solution: State each operational rule once in <safety_boundaries> and have <validation_checklist> / reference it by name instead of restating the full rule.
🔵 Medium	Reference Integrity	Problem: Sibling reference references/vendor-swap.md (line 13) maps jira_search_fields to "step 6 fallback + pitfalls", but SKILL.md has no step 6 — the process ends at step 5, and jira_search_fields is actually invoked in step 3 (custom-fields branch). The step-number citation is stale. Reason: A maintainer forking the skill follows the cited step number to find the call site; pointing at a non-existent step 6 sends them to the wrong place and erodes trust in the rebind list. Solution: In references/vendor-swap.md line 13 change "step 6 fallback + pitfalls" to "step 3 custom-fields branch + pitfalls" to match the actual SKILL.md process numbering.

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/mcp-jira-data-collection/references/redaction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/mcp-jira-data-collection/references/vendor-swap.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Reference Integrity	Problem: Line 13 cites the jira_search_fields call site as "(step 6 fallback + pitfalls)", but the parent SKILL.md has no step 6 — the call lives in step 3 (custom-fields branch). The cross-reference into the parent skill is stale and will misdirect a maintainer. Reason: vendor-swap.md is the maintainer's authoritative rebind map; a wrong step number defeats its purpose and could cause the call site to be missed during a fork. Solution: Change "(step 6 fallback + pitfalls)" on line 13 to "(step 3 custom-fields branch + pitfalls)" so the rebind list points at the real process step.

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/mcp-testrail-data-collection/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/skills/mcp-testrail-data-collection/references/redaction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/skills/mcp-testrail-data-collection/references/vendor-swap.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/skills/qa-data-collection/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	4	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/skills/qa-data-collection/references/backend-source-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/skills/qa-data-collection/references/existing-test-patterns.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/skills/qa-data-collection/references/output-template.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/skills/qa-gap-analysis/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/qa-project-config/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/qa-test-debugging/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/qa-test-debugging/references/failure-catalog.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Output Contract	5	✅ Much better
Decision Branching	5	✅ Much better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/qa-test-debugging/references/part-b-mechanics.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/qa-test-implementation/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/qa-test-implementation/references/multi-language-examples.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/repository-implementation-standards/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/skills/requirements-synthesis/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/skills/requirements-synthesis/references/output-schemas.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Single Responsibility	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/skills/sequential-workflow-execution/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/skills/swagger-contracts-analysis/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Cognitive Budget	Problem: The new SKILL.md is ~15.9K chars (10K-20K band). The 5-step plus inline <output_format> template, <validation_checklist>, <safety_boundaries>, <success_criteria>, <failure_handling>, and all live in the always-loaded entry file, so a single load carries a large directive surface even though three reference files already use progressive disclosure. Reason: Large always-loaded entry files raise per-invocation cognitive load and token cost; the skill already established a references/ lazy-load pattern that the inline template does not yet exploit. Solution: Move the full per-endpoint markdown template body (lines 116-176) out to references/ (the canonical-example already lives there) and keep only the field-name list inline, trimming the entry file toward the <300-line / sub-10K target.

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/swagger-contracts-analysis/references/canonical-example.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/swagger-contracts-analysis/references/failure-handling-edge-cases.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/swagger-contracts-analysis/references/redaction-catalog.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/testrail-test-case-authoring/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/testrail-test-case-authoring/references/examples-and-redaction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/testrail-test-case-export/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/skills/testrail-test-case-export/references/vendor-porting.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Self-Validation	Problem: The new file has no self-check step for the maintainer to confirm no TestRail-specific token survived the fork (tool names, 'section_id', 'C12345', 'custom_steps_separated'). Reason: A rebind that misses one TestRail-specific token leaves a broken or mis-targeted call in the forked skill; a quick self-check prevents it. Solution: Add a final 'grep for residual TestRail tokens (mcp_testrail_, section_id, custom_preconds, custom_steps_separated, C-prefix) and confirm none remain in the forked file' verification step.
🔵 Medium	Failure Handling	Problem: The guide does not tell the maintainer what to do when a target vendor lacks a TestRail concept (e.g. no dedup list call, or no step/expected split) beyond the single note that some vendors store the test as one body — there is no general fallback rule for missing capabilities. Reason: Silently dropping a capability during a fork can remove a safety step (such as the dedup pre-scan) without the maintainer noticing. Solution: Add a brief fallback rule: when the target vendor lacks an equivalent for a step (dedup list, container auto-create, separated steps), document the gap explicitly in the forked SKILL.md and degrade safely (e.g. skip dedup pre-scan but keep the confirmation gate) rather than silently dropping the safety step.
⚪ Low	Success Criteria	Problem: The new file lists items to rebind but gives no testable done-when criteria for a completed fork (e.g. 'every mcp_testrail_* call replaced', 'no TestRail-specific term remains', 'priority/type tables match vendor enum'). Reason: Without explicit done-when conditions the maintainer cannot confirm the fork is complete and may ship a skill with leftover TestRail bindings. Solution: Add a 'fork is complete when' list enumerating verifiable conditions: zero residual mcp_testrail_ references, container term replaced everywhere, ID-format check rebound, pitfalls rebound, user-prompt template re-branded.
⚪ Low	Output Contract	Problem: The guide tells the maintainer to copy the file to '-test-case-export/SKILL.md' and 'edit only the items above' but does not define the expected end-state shape (which sections must change vs stay verbatim is described prose-style, with no concrete checklist or example of one rebound item). Reason: A porting guide with no concrete output exemplar leaves the rebind quality to interpretation, raising the chance of an incompletely-ported skill. Solution: Add a single concrete before/after example for one rebind item (e.g. the priority table for Xray) and a short closing checklist of the sections that must end up vendor-specific, so the output of a fork is verifiable.
⚪ Low	Input Contract	Problem: The new reference is a maintainer-facing porting guide but states no input contract for the forking task it drives: it does not say which file the maintainer starts from (the sibling SKILL.md), what target-vendor facts must be gathered first (vendor MCP tool names, priority/type enums, container API capability), or what the maintainer must have on hand before editing. Reason: Without naming the required inputs, a maintainer can begin a fork missing the vendor facts the rebind steps depend on, producing a partially-rebound skill. Solution: Add a short 'before you start' list at the top naming the required inputs for a fork: source = sibling SKILL.md, plus the target vendor's MCP create/list/probe tool names, priority enum, type taxonomy, container auto-create capability, and case-ID shape.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/skills/user-approved-code-changes/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/adhoc-flow.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Workflow Completeness	4	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-code-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-data-collection.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-requirements-clarification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-selector-identification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-selector-implementation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-test-correction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-test-implementation.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Cognitive Budget	Problem: Step 6.1 (<execute_authoring>) packs sub-steps 1a-1d, 2, 3, 4, 5 plus a refusal clause and two embedded narrative paragraphs (Routing, authoring-decision-ownership) into one step. The single step carries the full handoff-contract reasoning, exceeding the ~5-actions-at-once reliable budget for one directive. Reason: A step with ~9 nested actions plus prose increases the chance the agent skips or merges sub-steps; smaller numbered steps execute more reliably. Solution: Split 6.1 into two steps: 6.1a 'load foundational + domain skills (1a-1d with zero-doc stop rule)' and 6.1b 'invoke handoff and verify (current 2-5)'. Move the Routing/ownership narrative into <workflow_context> so the step body is just numbered actions.
⚪ Low	Bloat Control	Problem: The new <skill_handoff> block and step 6.1 sub-step 4 repeat the same handoff contract (foundational skills must be loaded by the caller, the handoff only verify-presence, acceptable vs unacceptable handoff doc) three times across <workflow_context>, <skill_handoff>, and <execute_authoring> step 4. The long prose sentences (e.g. step 4 is a single ~90-word sentence) restate the verify-presence/stale-KB contract already stated in <skill_handoff>. Reason: Repeating the same multi-clause contract in three places adds reading cost without adding behavior, and long single-sentence directives are harder to follow reliably than short numbered ones. Solution: Keep the contract once in <skill_handoff> and have step 6.1 sub-step 4 reference it in one short line (e.g. 'verify handoff doc matches <skill_handoff> acceptance criteria; on mismatch record warning and ask user'). Remove the duplicated acceptable/unacceptable restatement from the step body.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	4	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	3	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-test-report-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Cognitive Budget	Problem: The added <orchestration_and_escalation> 'Verification-failure unilateral-start override' is a single bullet with six deeply nested sub-bullets (Deference, Precondition a/b/c, If-holds, If-uncertain, Scope, Rationale) defining one conditional rule. This is the densest block in a workflow-level orchestration file that should stay high-level and delegate detail to phase files. Reason: A six-level nested single rule in the top-level workflow forces the orchestrator to hold a lot of conditional state at once, raising the chance it mis-applies the no-ask override outside the intended gate. Solution: Compress the override to a 3-line rule: trigger (all three preconditions), action (print failing line + start earliest incomplete phase same turn, no AskUserQuestion), and default (any uncertainty -> normal HITL ask). Move the rationale and scope-lock wording into one trailing sentence.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better

📄 `instructions/r3/core/workflows/qa-flow-api-spec-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/workflows/qa-flow-data-collection.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/workflows/qa-flow-documentation-mcp-subflow.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Bloat Control	Problem: The <execute_documentation_mcp> intro paragraph and the early-exit rule explain the cross-referencing strategy meta-narratively ('Branch triggers reference <output_contract> by name; the literal outcome line is inlined parenthetically at each trigger site so the agent does not have to cross-jump...'). This is authoring-rationale about how the doc is organized, not an executable directive, and the same outcome strings are then inlined at every branch AND listed in <output_contract>, duplicating each 'Outcome:' line twice. Reason: Explaining the document's own organization is non-operational filler, and duplicating every outcome string between branches and the table risks the two copies drifting out of sync on a later edit. Solution: Drop the meta-explanation paragraph and keep only the operational early-exit rule. Inline the outcome line at the branch OR list it in <output_contract>, not both; reference the table once from the branches.
⚪ Low	Cognitive Budget	Problem: step 1.2b is realized through four interlocked sub-blocks (, <harvest_and_collect>, , <verify_remediation>) plus <output_contract>, with branch names (SKIPPED_NO_CONFIG, ACQUIRE_FAILED, EMPTY_HARVEST, COMPLETED) referenced across blocks and an early-exit jump rule. To execute one harvest the agent must hold five blocks and four branch identifiers in working memory simultaneously. Reason: Spreading one optional sub-phase across five mutually-referencing blocks increases the chance the agent loses track of which branch it took or skips the verify/remediation loop. Solution: Inline the verify_remediation cases into as numbered fallback steps and fold the early-exit outcome strings directly into each branch so the agent reads one linear block per branch without cross-jumping between five sections.

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/workflows/qa-flow-execution-and-report-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/workflows/qa-flow-gap-and-requirements-clarification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/workflows/qa-flow-project-config-loading.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/workflows/qa-flow-test-case-specification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/workflows/qa-flow-test-correction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/workflows/qa-flow-test-implementation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/workflows/qa-flow.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/workflows/testgen-flow-data-collection.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Conflict Resolution	Problem: Modified by the PR, marked clean for HITL. BASE lines 372-373 had 'Ask: Ready to proceed to Phase 2?' + '3. Wait for confirmation'; NEW step 1.4 (lines 154-178, Ask at line 177) deleted the explicit Wait line with no STOP/WAIT directive. Same per-phase HITL-explicitness reduction the auditors caught on the r2 project-config-loading twin but missed here. Reason: Consistent per-phase HITL weakening across the testgen family; low severity because the parent flow gates the transition, but it should be flagged uniformly. Solution: Add an explicit STOP-and-wait directive after the 'Ready to proceed to Phase 2?' question in step 1.4, matching the deterministic phase-transition gate wording.

📊 Gates Comparison

Gate	Score	Comparison
Single Responsibility	5	⬆️ Slightly better
Conflict Resolution	3	⬇️ Slightly worse
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow-gap-and-contradiction-analysis.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Conflict Resolution	Problem: Modified by the PR, marked clean for HITL. BASE lines 361-362 had 'Ask: Ready to proceed to Phase 3?' + '4. Wait for confirmation'; NEW step 2.4 (lines 85-89, Ask at line 88) deleted the explicit Wait line with no STOP/WAIT directive. Same systematic per-phase HITL-explicitness deletion flagged on the r2 project-config twin but missed here. Reason: Part of the same family-wide pattern where every per-phase 'Wait for confirmation' line was removed; low severity given the parent-flow gate, but the loss is real and should be flagged consistently. Solution: Add an explicit STOP-and-wait directive after the 'Ready to proceed to Phase 3?' question in step 2.4, matching the other phase-transition gates.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	3	⬇️ Slightly worse
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/testgen-flow-project-config-loading.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Conflict Resolution	Problem: Modified by the PR but marked clean. BASE line 157 had '3. Wait for confirmation' after the 'Ready to proceed to Phase 1?' question; NEW step 0.6 (lines 95-99) keeps the Ask but deleted the explicit Wait line with no STOP/WAIT replacement. This is the IDENTICAL per-phase HITL-explicitness deletion the auditors flagged on the r2 twin (testgen-flow-project-config-loading.md, Conflict Resolution sev 2) but missed on r3. Reason: Asking without an explicit wait lets an agent ask and immediately proceed, weakening the per-phase HITL gate. The parent testgen-flow.md may still gate the transition, so severity is low, but the loss should be flagged consistently with its r2 twin. Solution: Add an explicit STOP-and-wait directive after the Phase-1 readiness question in step 0.6 (e.g. 'STOP and wait for user confirmation before advancing to Phase 1'), matching the r2 verdict and the deterministic gate wording used elsewhere.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	3	⬇️ Slightly worse
Decision Branching	5	✅ Much better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/testgen-flow-question-generation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/testgen-flow-requirements-document-generation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/testgen-flow-test-case-export.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow-test-case-generation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/testgen-flow.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Safety Boundaries	Problem: New <orchestration_and_escalation> introduces a 'verification-failure unilateral-start override' that authorizes the agent to start a phase in the SAME TURN without asking the user (no AskUserQuestion). BASE had no such no-ask override. It expands the sanctioned no-ask surface relative to base. The same override was added to aqa-flow.md and r2 testgen-flow.md. Reason: Simulated agent triggers the override only when ALL three preconditions hold: user explicitly asserted phases complete in this turn, state file does NOT mark them complete, AND the named artifacts are absent on disk. In that exact case asking again would create a contradictory loop. The rule is scope-locked, defaults any uncertainty to ASK, and explicitly preserves Phase 3/6 HITL and all destructive confirmations. So this is a narrowly justified, defensible relaxation, not a broad HITL weakening. Residual concern is auditability and risk of an agent over-generalizing the carve-out, which the dense wording mitigates. Solution: Keep the carve-outs (they are strong). Optionally require the override decision to be logged into testgen-state.md (not just printed once) so the relaxation leaves an auditable trail, and re-confirm precedence under bootstrap-hitl-questioning.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	3	⬇️ Slightly worse
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

github-actions · 2026-06-03T15:03:29Z