add qa-flow, consolidate QA/AQA/testgen skills, add manual tests by sveto · Pull Request #110 · griddynamics/rosetta

sveto · 2026-06-11T08:38:54Z

QA, AQA, Testgen workflows tested and bugfixed/hardenen in several places. Some common parts of them, along with the skills created previously by Maksym, are merged into 4 pre-existing skills and 2 new skills foreseen by docs/definitions/skills.md.

(The previous problem with multiple scattered skills comes from the fact that docs/definitions/skills.md, along with the idea of prototypic pre-defined skills, is not mentioned anywhere in docs and instructions. The only mention I found is located inside of a reference file of a skill I didn't touch within this PR.)

Also I added instructions on how to test the workflows: docs/manual-tests.

❗️ A 'Cognitive Budget' problem I don't know how to solve (qa-flow-project-config-loading and aqa-flow-data-collection): the files are too long, and the correct fix would be to move templates to a reference file or an asset file, but workflow phase files do not have anything similar to references/assets.

…erge-rebase

github-actions · 2026-06-11T08:48:50Z

Rosetta Triage Review

Summary: This PR consolidates and significantly refactors the QA, AQA, and Testgen AI agent workflows — introducing a brand-new end-to-end qa-flow (10 phase files), two new skills (discovery, scenarios-generation), and hardening all three workflow families across both r2 and r3 instruction trees. It also improves the validate-prompts CI pipeline and adds manual test guides for all three flows.

Note — Instruction-quality review: This PR changes instructions/r*/**, so automated instruction-quality analysis was performed in addition to standard code review.

Findings:

[CRITICAL] operation-manager/SKILL.md — Agent-agnostic violation
Frontmatter allowed-tools: Bash(npx:*) is Claude Code-specific; npx rosettify@latest is Node.js-specific. Per Rosetta prompt hardening rules, skills must be coding-agent-agnostic. The fallback path (ACQUIRE todo-tasks-fallback.md FROM KB) exists but is not clearly surfaced for non-Claude-Code agents. The model: claude-sonnet-4-6 frontmatter field is also Claude Code-specific.
Recommendation: Document explicitly that this skill requires Node.js + Claude Code, or restructure so the fallback is the default path with the npx variant as an opt-in.

[HIGH] Multiple files — Frontmatter description exceeds 30-token cap

discovery/SKILL.md: ~46 tokens. Suggested: "Gather source artifacts from systems-of-record (Jira/Confluence/TestRail) into a phase-defined raw-context artifact. Read-only."
scenarios-generation/SKILL.md: ~36 tokens. Remove the inline contrast-with-testing clause (it belongs in <when_to_use_skill>).
qa-flow.md: ~97 tokens, hardcodes vendor names. Suggested: "Backend API test automation end-to-end: from test cases to automated tests, with HITL gates at specification and correction phases."
orchestrator-contract/SKILL.md: ~40 tokens. Suggested: "MUST activate when acting as orchestrator: defines delegation, dispatch, routing, and review protocol."

[HIGH] discovery/SKILL.md and orchestrator-contract/SKILL.md — Missing <when_to_use_skill> section
Both skills lack the schema-required <when_to_use_skill> section that disambiguates when an agent should load this skill vs a peer. All other new skills in this PR include it.

[HIGH] operation-manager/SKILL.md — SRP violation (dual orchestrator/subagent role)
The <process> section defines two complete independent flows (Orchestrator flow + Subagent flow). Recommendation: add a clear self-selection gate at the top or extract one flow to a reference.

[HIGH] qa-flow-project-config-loading.md — Step numbering collision + template bulk
Steps 0.2a and 0.2 appear in that order, which is contradictory. Fix: renumber sequentially. Also, 113 of 276 lines are template content — templates belong in references/ per hardening rules. Recommendation: extract to references/qa-project-config-loading-templates.md.

[MEDIUM] DRY — Duplicated validation rules in discovery/SKILL.md and scenarios-generation/SKILL.md
Both skills restate the sensitive-data redaction rule in both <core_concepts> and <validation_checklist>. The checklist entry should be a pointer to <core_concepts>, not a restatement.

[MEDIUM] orchestrator-contract/SKILL.md — Inline subagent dispatch template should be in references/
The boilerplate 42-line dispatch template embedded in <process> is reference material. Extracting it would reduce the file from 140 to ~90 lines.

[MEDIUM] qa-flow.md — HITL skip-rules partially duplicate hitl skill
<skip_rules> contains gate-execution logic that the hitl skill owns canonically. Keep only the qa-flow-specific precondition table; defer gate execution to USE SKILL hitl.

[POSITIVE] CI improvement in validate-prompts.yml
Significantly improved pipeline: structured JSON output, severity-based gate (blocks on HIGH+), PR comment summary with link to full per-file details in Actions run summary.

[POSITIVE] Manual test docs
docs/manual-tests/ is an excellent addition — auth-free testing modes (Mode A/B), per-phase checklists, and "try to break it" sections make the flows much easier to validate manually.

Suggestions:

Consider adding a docs/definitions/skills.md reference to the Rosetta onboarding docs — this PR surfaces the issue that the canonical skills list is not discoverable to contributors.
For operation-manager: the inline canonical next output JSON example belongs in references/rosettify-next-output.md with a pointer from the skill body.
PR title could be more descriptive — e.g. feat: add qa-flow, consolidate QA/AQA/testgen skills, add manual tests.

Automated triage by Rosetta agent

github-actions · 2026-06-11T09:04:36Z

📋 Prompt Quality Validation Report

❌ Validation Failed

Summary by File

File	🟡 High	🔵 Medium	⚪ Low	Status
`instructions/r2/core/skills/coding/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/coding/SKILL.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/debugging/SKILL.md`	0	1	0	⚠️ Warning
`instructions/r3/core/skills/debugging/SKILL.md`	0	1	0	⚠️ Warning
`instructions/r2/core/skills/reverse-engineering/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/reverse-engineering/SKILL.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/discovery/SKILL.md`	0	1	0	⚠️ Warning
`instructions/r3/core/skills/discovery/SKILL.md`	0	1	0	⚠️ Warning
`instructions/r2/core/skills/discovery/references/confluence-binding.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/discovery/references/confluence-binding.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/discovery/references/jira-binding.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/discovery/references/jira-binding.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/discovery/references/testrail-binding.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/discovery/references/testrail-binding.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/requirements-authoring/SKILL.md`	0	1	0	⚠️ Warning
`instructions/r3/core/skills/requirements-authoring/SKILL.md`	0	1	0	⚠️ Warning
`instructions/r2/core/skills/requirements-authoring/references/authoring-catalogs.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/requirements-authoring/references/authoring-catalogs.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/requirements-use/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/requirements-use/SKILL.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/requirements-use/references/gap-analysis-catalogs.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/requirements-use/references/gap-analysis-catalogs.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/requirements-authoring/assets/ra-requirement-unit.xml`	0	1	0	⚠️ Warning
`instructions/r2/core/skills/scenarios-generation/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/scenarios-generation/SKILL.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/scenarios-generation/references/gwt-spec.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/scenarios-generation/references/gwt-spec.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/scenarios-generation/references/testrail-export.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/scenarios-generation/references/testrail-export.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/scenarios-generation/references/testrail-format.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/scenarios-generation/references/testrail-format.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/operation-manager/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/operation-manager/SKILL.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/operation-manager/assets/om-schema.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/testing/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/testing/SKILL.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/testing/references/implementation-examples.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/testing/references/implementation-examples.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/orchestrator-contract/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/orchestrator-contract/SKILL.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow-code-analysis.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-code-analysis.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow-data-collection.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-data-collection.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow-requirements-clarification.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-requirements-clarification.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow-selector-identification.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-selector-identification.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow-selector-implementation.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-selector-implementation.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow-test-correction.md`	0	0	1	⚠️ Warning
`instructions/r3/core/workflows/aqa-flow-test-correction.md`	0	0	1	⚠️ Warning
`instructions/r2/core/workflows/aqa-flow-test-implementation.md`	0	0	1	⚠️ Warning
`instructions/r3/core/workflows/aqa-flow-test-implementation.md`	0	0	1	⚠️ Warning
`instructions/r2/core/workflows/aqa-flow-test-report-analysis.md`	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/aqa-flow-test-report-analysis.md`	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/qa-flow.md`	1	0	0	❌ Fail
`instructions/r3/core/workflows/qa-flow.md`	1	0	0	❌ Fail
`instructions/r2/core/workflows/qa-flow-api-spec-analysis.md`	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/qa-flow-api-spec-analysis.md`	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/qa-flow-data-collection.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-data-collection.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-documentation-mcp-subflow.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-documentation-mcp-subflow.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-execution-and-report-analysis.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-gap-and-requirements-clarification.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-project-config-loading.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-test-case-specification.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-test-correction.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-test-implementation.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-execution-and-report-analysis.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-gap-and-requirements-clarification.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-project-config-loading.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-test-case-specification.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-test-correction.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-test-implementation.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/testgen-flow.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/testgen-flow-data-collection.md`	1	0	0	❌ Fail
`instructions/r3/core/workflows/testgen-flow-data-collection.md`	1	0	0	❌ Fail
`instructions/r2/core/workflows/testgen-flow-gap-and-contradiction-analysis.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-gap-and-contradiction-analysis.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/testgen-flow-project-config-loading.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-project-config-loading.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/testgen-flow-question-generation.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-question-generation.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/testgen-flow-requirements-document-generation.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-requirements-document-generation.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/testgen-flow-test-case-export.md`	2	0	0	❌ Fail
`instructions/r3/core/workflows/testgen-flow-test-case-export.md`	2	0	0	❌ Fail
`instructions/r2/core/workflows/testgen-flow-test-case-generation.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-test-case-generation.md`	0	0	0	✅ Pass

📋 Full per-file findings (Problem / Reason / Solution + Gates Comparison) → Workflow run Summary (PR comments are capped at 65,536 chars; details live on the Actions run).

github-actions · 2026-06-11T09:16:29Z

📋 Prompt Quality Validation Report

❌ Validation Failed

Summary by File

File	🟠 Very High	🟡 High	🔵 Medium	⚪ Low	Status
`instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/skills/coding/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/debugging/SKILL.md`	0	0	2	0	⚠️ Warning
`instructions/r2/core/skills/orchestrator-contract/SKILL.md`	0	1	2	0	❌ Fail
`instructions/r2/core/skills/reverse-engineering/SKILL.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/skills/operation-manager/SKILL.md`	1	2	1	0	❌ Fail
`instructions/r2/core/skills/operation-manager/assets/om-schema.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/discovery/SKILL.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/skills/discovery/references/confluence-binding.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/discovery/references/jira-binding.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/discovery/references/testrail-binding.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/requirements-use/SKILL.md`	0	2	0	0	❌ Fail
`instructions/r2/core/skills/requirements-use/references/gap-analysis-catalogs.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/requirements-authoring/SKILL.md`	0	0	3	0	⚠️ Warning
`instructions/r2/core/skills/requirements-authoring/references/authoring-catalogs.md`	0	0	2	0	⚠️ Warning
`instructions/r2/core/skills/requirements-authoring/assets/ra-requirement-unit.xml`	0	3	2	0	❌ Fail
`instructions/r2/core/skills/scenarios-generation/SKILL.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/skills/scenarios-generation/references/gwt-spec.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/scenarios-generation/references/testrail-export.md`	0	0	2	0	⚠️ Warning
`instructions/r2/core/skills/scenarios-generation/references/testrail-format.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/testing/SKILL.md`	0	1	2	0	❌ Fail
`instructions/r2/core/skills/testing/references/implementation-examples.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/aqa-flow.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow-code-analysis.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow-data-collection.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/aqa-flow-requirements-clarification.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/aqa-flow-selector-identification.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow-selector-implementation.md`	0	1	0	0	❌ Fail
`instructions/r2/core/workflows/aqa-flow-test-correction.md`	0	1	0	0	❌ Fail
`instructions/r2/core/workflows/aqa-flow-test-implementation.md`	0	1	0	2	❌ Fail
`instructions/r2/core/workflows/aqa-flow-test-report-analysis.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow.md`	0	0	4	0	⚠️ Warning
`instructions/r2/core/workflows/qa-flow-api-spec-analysis.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/qa-flow-data-collection.md`	0	1	1	1	❌ Fail
`instructions/r2/core/workflows/qa-flow-documentation-mcp-subflow.md`	0	1	0	1	❌ Fail
`instructions/r2/core/workflows/qa-flow-project-config-loading.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/qa-flow-gap-and-requirements-clarification.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-test-case-specification.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/qa-flow-test-implementation.md`	0	1	0	1	❌ Fail
`instructions/r2/core/workflows/qa-flow-execution-and-report-analysis.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-test-correction.md`	0	1	0	0	❌ Fail
`instructions/r2/core/workflows/testgen-flow.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/testgen-flow-project-config-loading.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/testgen-flow-data-collection.md`	0	1	0	1	❌ Fail
`instructions/r2/core/workflows/testgen-flow-gap-and-contradiction-analysis.md`	0	0	2	0	⚠️ Warning
`instructions/r2/core/workflows/testgen-flow-question-generation.md`	0	1	0	0	❌ Fail
`instructions/r2/core/workflows/testgen-flow-requirements-document-generation.md`	0	1	3	0	❌ Fail
`instructions/r2/core/workflows/testgen-flow-test-case-generation.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/testgen-flow-test-case-export.md`	1	3	1	0	❌ Fail
`instructions/r3/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/coding/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/debugging/SKILL.md`	0	0	2	0	⚠️ Warning
`instructions/r3/core/skills/orchestrator-contract/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/reverse-engineering/SKILL.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/skills/operation-manager/SKILL.md`	0	1	0	0	❌ Fail
`instructions/r3/core/skills/discovery/SKILL.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/skills/discovery/references/confluence-binding.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/discovery/references/jira-binding.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/discovery/references/testrail-binding.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/requirements-use/SKILL.md`	0	2	0	0	❌ Fail
`instructions/r3/core/skills/requirements-use/references/gap-analysis-catalogs.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/requirements-authoring/SKILL.md`	0	0	3	0	⚠️ Warning
`instructions/r3/core/skills/requirements-authoring/references/authoring-catalogs.md`	0	0	2	0	⚠️ Warning
`instructions/r3/core/skills/scenarios-generation/SKILL.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/skills/scenarios-generation/references/gwt-spec.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/scenarios-generation/references/testrail-export.md`	0	0	2	0	⚠️ Warning
`instructions/r3/core/skills/scenarios-generation/references/testrail-format.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/testing/SKILL.md`	0	1	2	0	❌ Fail
`instructions/r3/core/skills/testing/references/implementation-examples.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/aqa-flow.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-code-analysis.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-data-collection.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/aqa-flow-requirements-clarification.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/aqa-flow-selector-identification.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-selector-implementation.md`	0	1	0	0	❌ Fail
`instructions/r3/core/workflows/aqa-flow-test-correction.md`	0	1	0	0	❌ Fail
`instructions/r3/core/workflows/aqa-flow-test-implementation.md`	0	1	0	2	❌ Fail
`instructions/r3/core/workflows/aqa-flow-test-report-analysis.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow.md`	0	0	4	0	⚠️ Warning
`instructions/r3/core/workflows/qa-flow-api-spec-analysis.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/qa-flow-data-collection.md`	0	1	1	1	❌ Fail
`instructions/r3/core/workflows/qa-flow-documentation-mcp-subflow.md`	0	1	0	1	❌ Fail
`instructions/r3/core/workflows/qa-flow-execution-and-report-analysis.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-gap-and-requirements-clarification.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-project-config-loading.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/qa-flow-test-case-specification.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/qa-flow-test-correction.md`	0	1	0	0	❌ Fail
`instructions/r3/core/workflows/qa-flow-test-implementation.md`	0	1	0	1	❌ Fail
`instructions/r3/core/workflows/testgen-flow.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/testgen-flow-data-collection.md`	0	1	0	1	❌ Fail
`instructions/r3/core/workflows/testgen-flow-gap-and-contradiction-analysis.md`	0	0	2	0	⚠️ Warning
`instructions/r3/core/workflows/testgen-flow-project-config-loading.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-question-generation.md`	0	1	0	0	❌ Fail
`instructions/r3/core/workflows/testgen-flow-requirements-document-generation.md`	0	1	3	0	❌ Fail
`instructions/r3/core/workflows/testgen-flow-test-case-export.md`	1	3	1	0	❌ Fail
`instructions/r3/core/workflows/testgen-flow-test-case-generation.md`	0	0	1	0	⚠️ Warning

📄 `instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Example Grounding	Problem: The added `<audit_survival_checks>` block is dense pointer prose (e.g. 'Dual structure → the phase asserts the contract, the skill emits it', 'Vendors: config-key precedence, not literal tags') with no concrete example of a pass-vs-fail case for any check. These checks are abstract and the reviewer applying them has nothing to pattern-match against. Reason: Abstract review heuristics without grounding get applied inconsistently across auditors. Solution: Add one short concrete pass/fail example for the most error-prone checks (e.g. an N-sections mismatch, or a literal-tag vendor reference), or point to an existing worked example elsewhere in the references.

📊 Gates Comparison

Gate	Score	Comparison
Single Responsibility	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/coding/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/debugging/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Reference Integrity	Problem: The added `<when_to_use_skill>` line points the reader to the triage mode 'in `<process>`', but the new section is named `<test_execution_triage>` and the skill has no `<process>` block. The mental hook lands on a section that does not exist. Reason: A pointer to a non-existent tag makes the agent search for content it cannot find, weakening reliable routing into the new mode. Solution: Change 'use the test-execution triage mode in `<process>`' to point to `<test_execution_triage>` (the actual section tag).
🔵 Medium	Single Responsibility	Problem: The added `<test_execution_triage>` block introduces a second, distinct responsibility — read-only triage of an automated-test execution report with its own taxonomy, capture analysis, and cross-failure pattern detection — onto a skill whose core job is root-cause debugging of a single issue. The `<when_to_use_skill>` and frontmatter now advertise two jobs. Reason: Two responsibilities in one skill raises the chance an agent applies the wrong mode or loads triage machinery when only simple debugging is needed. Solution: Acceptable as a bounded mode if kept thin; if the triage mode grows further it should move to its own skill. For now ensure the triage mode stays a pointer-style specialization (it correctly references `<core_concepts>` for evidence labels and redaction) and does not accumulate independent process depth.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/orchestrator-contract/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Rosetta	Problem: The added `<prerequisites>` bullet 'OPERATION_MANAGER is active' and the dispatch-template line 'MUST USE SKILL `subagent-contract`, `operation-manager`' bind this skill to `operation-manager`, which is not in the canonical `docs/definitions/skills.md` (only `plan-manager` is). pa-rosetta requires using names from `docs/definitions/.md` and not auto-adding out-of-list items. Reason:* Referencing an out-of-list skill as a hard prerequisite makes the contract depend on a name the KB does not officially recognize. Solution: Add `operation-manager` to the canonical skills definitions (or reference the already-canonical `plan-manager`), so the binding resolves against the canonical list.
🔵 Medium	Single Responsibility	Problem: The added items 22–28 fold a full phase-by-phase workflow drive-loop (just-in-time ACQUIRE, state-file updates, phase-skip confirmation, downstream prerequisite verification) into the orchestrator-contract skill, which previously owned only delegation/dispatch/review. The skill now carries both the delegation contract and the multi-phase execution loop. Reason: Mixing the dispatch contract with the phase-execution engine increases the skill's responsibility count and the chance of the two concerns drifting. Solution: Keep the drive-loop here only if it stays pointer-thin; otherwise the phase-chaining loop belongs with the (missing) `load-workflow` authority it keeps deferring to. At minimum resolve the `load-workflow` reference so ownership of loading vs driving is unambiguous.
🔵 Medium	Reference Integrity	Problem: The diff adds three references to a `load-workflow` skill as a canonical authority (core_concepts: 'WORKFLOW LOADING is a separate canonical concern owned by `load-workflow`'; process #28; resources block). No `load-workflow` skill exists in r2 (no skill folder) and it is absent from the canonical `docs/definitions/skills.md` list. The skill delegates its entire workflow-loading concern to a target that cannot be acquired. Reason: An agent told that loading is owned by `load-workflow` will try to use a skill that does not resolve, breaking the delegation chain at the top of every multi-phase run. Solution: Either add `load-workflow` to the canonical skills list and create the skill, or point these references to the actual loading authority (e.g. the bootstrap prep steps / `load-context`) that exists in r2. Do not leave a dangling canonical reference.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/reverse-engineering/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Single Responsibility	Problem: The added `<analysis_modes>` block introduces two concrete, domain-specific modes (test-automation architecture analysis, API-contract extraction) into a skill whose general purpose is code→spec reverse engineering. These modes carry their own GATEs, source-priority lists, and per-endpoint templates, broadening the skill beyond its single distillation responsibility. Reason: Each added concrete mode widens the skill's job count and the surface an agent must scan to apply the right one. Solution: Acceptable while the modes stay thin specializations that EMIT into the phase-owned artifact (they currently do, and defer artifact shape/path to the phase). Watch for further mode accretion; if more modes are added, extract them to a dedicated test/API-analysis skill.

📊 Gates Comparison

Gate	Score	Comparison
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/operation-manager/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟠 Very High	Reference Integrity	Problem: core_concepts and resources reference `ACQUIRE` todo-tasks-fallback.md `FROM KB` as the built-in fallback when MCP and CLI both fail. No `todo-tasks-fallback.md` exists under instructions/r2 (it exists only in r3) and it is not in the canonical r2 `docs/definitions/rules.md`. The whole fallback path is unreachable in r2. Reason: When the CLI fails the agent is told to ACQUIRE a file that does not resolve in r2, leaving it with no working fallback at the moment it most needs one. Solution: Add `todo-tasks-fallback.md` to r2 (and to r2 rules definitions) or point the fallback at an existing r2 rule/asset. Do not ship a fallback whose target cannot be acquired in this release.
🟡 High	Precision & Explicitness	Problem: core_concepts states the CLI as `npx rosettify@latest <command> <subcommand> <plan_file>` with the `<command>` slot left as a placeholder, but every concrete invocation in `<process>` and `<validation_checklist>` uses the literal command `plan` (e.g. `plan next`, `plan update_status`, `plan query`). The generic `<command>` placeholder is never bound to `plan`, so the one term for the command concept is presented two ways. Reason: A placeholder command that is never bound forces the agent to infer the command name, risking malformed invocations. Solution: State the command literally once (`npx rosettify@latest plan <subcommand> <plan_file>`) as plan-manager does, or explicitly define that `<command>` is always `plan` for this skill.
🟡 High	Rosetta	Problem: This new skill `operation-manager` duplicates the existing canonical `plan-manager` skill (identical role, identical description, near-identical core_concepts and process) but is not in `docs/definitions/skills.md` (only `plan-manager` is listed). Two near-identical plan-management skills coexist in r2 and orchestrator-contract now hard-binds to the out-of-list one. pa-rosetta forbids auto-adding out-of-list items and DRY forbids the duplication. Reason: Two competing skills with the same job split callers and guarantee drift; an out-of-list skill is not recognized by the canonical KB. Solution: Decide one canonical plan/operation manager: either add `operation-manager` to the canonical skills list and deprecate/remove `plan-manager`, or fold the new CLI/template changes back into `plan-manager`. Do not keep both.
🔵 Medium	Reference Integrity	Problem: Resources lists `USE FLOW` adhoc-flow``. A skill pointing outward to a workflow is reverse/sibling awareness — per pa-hardening boundaries a skill should not know which flow runs it. plan-manager does not carry this pointer. Reason: Skill→workflow awareness violates the prompt-boundary contract and couples the skill to a specific flow. Solution: Remove the `USE FLOW adhoc-flow` line from the skill; flow selection is the orchestrator/bootstrap concern, not the skill's.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	4	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Reference Integrity	2	⬇️ Slightly worse
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/operation-manager/assets/om-schema.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/discovery/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Rosetta	Problem: The frontmatter `description` (lines 3) is two full sentences (~55 tokens): 'Rosetta skill to gather source artifacts ... Use to collect issues/tickets, test cases, and documentation pages for downstream requirements, test design, or debugging phases.' pa-hardening requires frontmatter description be a call-to-action and extremely dense (<30 tokens). Reason: Frontmatter is loaded into every agent's context for skill selection; an over-long description wastes the always-resident budget and violates the Rosetta frontmatter density rule. Solution: Compress the description to a single dense call-to-action under 30 tokens, e.g. 'Collect + normalize + redact source-of-record artifacts (Jira/Confluence/TestRail via MCP) into the phase-defined raw-context artifact.'

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/skills/discovery/references/confluence-binding.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/discovery/references/jira-binding.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/discovery/references/testrail-binding.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/requirements-use/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Goal Specification	Problem: The PR adds a whole second responsibility, the `<gap_analysis>` analysis-only mode (multi-source contradiction/gap/ambiguity classification over Jira/Confluence/TestRail/API-spec/test-plan data), but neither the frontmatter `description` ("Consume approved requirements to drive planning, implementation, and validation...") nor `<when_to_use_skill>` ("implementing from approved requirements, planning work from requirement IDs, or auditing requirement-to-delivery traceability") mentions gap analysis. Skill routing is driven by description/when-to-use, so the new mode is undiscoverable by the dispatching agent. Reason: An entry mode that the trigger metadata never names will not be loaded for the cases it exists to serve. Solution: Add the gap_analysis mode to `<when_to_use_skill>` and to the frontmatter `description` (e.g. add a clause about analyzing collected multi-source data for gaps/contradictions/ambiguities) so the mode is selectable.
🟡 High	Single Responsibility	Problem: The PR bolts a whole new `<gap_analysis>` analysis-only mode (lines 93-105) onto a skill whose stated job (frontmatter + `<role>` line 24: 'using requirements as execution contract') is consuming approved requirements to drive planning/implementation/validation. Multi-source contradiction/gap/ambiguity detection across Jira/Confluence/TestRail/API-spec/test-plans is a distinct responsibility from requirement-to-delivery traceability, pushing the skill from 1-2 jobs toward 3. Reason: A skill description that advertises only requirement usage but silently contains a second analysis mode harms skill selection and violates the SRP/single-responsibility expectation; agents may load it for the wrong job or miss the mode entirely. Solution: Either keep the two responsibilities deliberately fused and state the dual-mode scope explicitly in the frontmatter/`<role>`/`<when_to_use_skill>` so callers understand both modes, or extract `<gap_analysis>` into its own analysis skill that the phase invokes separately.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	3	⬇️ Slightly worse
Single Responsibility	3	⬇️ Slightly worse
Output Contract	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/requirements-use/references/gap-analysis-catalogs.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/skills/requirements-authoring/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Single Responsibility	Problem: The PR adds a `<synthesis>` mode (synthesize multi-source Jira/Confluence/TestRail/answers/gap-analysis data into one structured requirements document) on top of the existing authoring/updating/reviewing responsibility. The frontmatter `description` was NOT updated to name synthesis (it still reads only "Author, update, and validate functional and non-functional requirements..."), though `<when_to_use_skill>` was updated to add "synthesizing". This is a milder version of the requirements-use split, and synthesis shares the authoring rules, so it is more cohesive — but description/when-to-use are now inconsistent about the mode set. Reason: Frontmatter drives skill selection; if it omits a mode that when-to-use and the body define, the mode may not be routed. Solution: Add synthesis to the frontmatter `description` so trigger metadata matches the `<when_to_use_skill>` and the `<synthesis>` body.
🔵 Medium	Workflow Completeness	Problem: Compression collapsed the explicit base `<authoring_flow>` (15 ordered bullets including 'Check against current best practices' and 'Once drafting is done proactively seek user approval') into a 3-step flow (lines 85-91). The proactive 'check against current best practices' step is no longer stated in SKILL.md or the catalogs. Reason: Dropping an explicit ordered step from a multi-step authoring flow risks the agent skipping the best-practices validation that the base flow enforced. Solution: Confirm the best-practices-check step is intentionally dropped or fold it back into step 2 of `<authoring_flow>` (e.g. 'run quality-gate + best-practices checks').
🔵 Medium	Precision & Explicitness	Problem: The base NFR rule 'Update existing requirements with new schema' (base `<nonfunctional_requirements>`) was dropped in the compression and is not recoverable in NEW SKILL.md (line 78 NFR clause) nor in references/authoring-catalogs.md (NFR schema section, lines 114-125). The instruction to migrate already-authored requirements onto the current schema is lost. Reason: Without the migrate-to-current-schema directive, the agent may leave legacy units on an outdated schema during updates, producing an inconsistent requirements set. Solution: Re-add a short clause (in `<requirement_statements>` NFR bullet or the catalogs schema-fields section) stating that existing requirement units must be updated to the current schema when re-authored.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/skills/requirements-authoring/references/authoring-catalogs.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Rosetta	Problem: Lines 51 and 49-55 restate the requirement schema fields and ID conventions, and the unit template (lines 9-28) duplicates the verbatim `<req>` template that also lives in the asset `ra-requirement-unit.xml`. The brief says SKILL.md owns rules/methods and this file holds reference catalogs, but the `<req>` template is now duplicated across this reference AND the asset, violating DRY/SSoT within the family. Reason: Duplicated canonical templates drift apart (already visible vs the XML asset), the exact failure mode the single-source convention prevents. Solution: Keep the verbatim `<req>` template in exactly one location (the asset) and have this catalog reference it, rather than copying the full template block.
🔵 Medium	Precision & Explicitness	Problem: The `<req>` unit template here (lines 9-28) uses the OLD two-field shape: `NotStarted

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	4	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/requirements-authoring/assets/ra-requirement-unit.xml`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Conflict Resolution	Problem: After this change the asset (`[Implemented
🟡 High	Precision & Explicitness	Problem: The new token set `[Implemented
🟡 High	Output Contract	Problem: The change collapsed two structured fields into one freeform field: BASE had `NotStarted
🔵 Medium	Rosetta	Problem: This asset's `<implementation>` line (line 38) now diverges from the same template restated in the sibling `authoring-catalogs.md` (lines 25-26 of that file, still the old two-field shape). Rosetta DRY/SSoT within a family requires one canonical definition; the PR changed this copy without updating the sibling, creating an intra-family contradiction rather than a single source of truth. Reason: Same-family files teaching different field shapes break the single-source-of-truth discipline Rosetta enforces. Solution: Make exactly one file canonical for the `<req>` implementation field and have the other reference it; update both together so the family stays consistent.
🔵 Medium	Example Grounding	Problem: The new inline guidance `[Additional Notes: files affected for implemented, notes without duplication for what changed for todo and modify]` is denser and less concrete than the BASE `[CONCISE: Implemented: aggregated files affected, NotStarted/Planned/ToBeRemoved: nothing, ToBeModified: what was originally documented but now dropped]`, which spelled out the expected note content per state. The per-state mapping of what to write is now partially lost. Reason: Losing the per-state note guidance reduces fill-in reliability for the implementation field. Solution: Restore an explicit per-state note expectation (what to write for each token) so the template self-documents the notes field.

📊 Gates Comparison

Gate	Score	Comparison
Output Contract	3	⬇️ Slightly worse
Conflict Resolution	2	⬇️ Slightly worse
Precision & Explicitness	3	⬇️ Slightly worse
Example Grounding	3	⬇️ Slightly worse
Bloat Control	4	⬆️ Slightly better
Rosetta	3	⬇️ Slightly worse

📄 `instructions/r2/core/skills/scenarios-generation/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Single Responsibility	Problem: The new skill carries three distinct authoring modes — gwt_spec (Given-When-Then API specs), generation (TMS-format cases), and a vendor_binding resolver — plus a shared validation checklist. That is broad for one skill; the gwt_spec mode and the TMS generation mode are quite different artifact shapes. Reason: Two related but materially different output artifacts (ATC GWT specs vs TMS Steps/Expected cases) increase the cognitive search space within a single resident prompt. Solution: Acceptable as one skill since all three are 'design test scenarios from requirements'; keep but ensure the mode boundary in <gwt_spec> vs stays crisp so an agent never blends an ATC spec with a TMS case template. No split required.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/scenarios-generation/references/gwt-spec.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/scenarios-generation/references/testrail-export.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Rosetta	Problem: Rosetta prompts are coding-agent-agnostic and avoid hardcoded tool names; this binding embeds concrete `mcp_testrail_` tool signatures (steps 1, 7, 8) as the operational contract. While a vendor binding legitimately names the vendor, the per-call MCP symbol form is a specific tool-name assumption rather than a 'tell how to think' resolution from project config. Reason:* Hardcoded tool symbols reduce agent-agnostic portability, which the Rosetta gate guards against. Solution: Frame the MCP signatures as the shape to invoke against the project-resolved TestRail MCP tool names, consistent with the SKILL's stance that the skill never reads config and the phase resolves the binding.
🔵 Medium	Dependency Management	Problem: MCP tool names are hardcoded throughout the process steps (`mcp_testrail_get_project`, `mcp_testrail_get_cases`, `mcp_testrail_add_case`). This is a vendor-specific export binding, so TestRail names are expected here, but the tool-call invocations assume one MCP server naming scheme; a TestRail MCP exposed under a different tool prefix in the target project would not match. The file does parameterize the vendor concept abstractly (the swap table at the end), but the live process steps bake the exact `mcp_testrail_` symbols. Reason:* Hardcoded MCP symbol names can silently mismatch a differently-named TestRail MCP, breaking the export at call time. Solution: State once near the top that the TestRail MCP tool symbols are placeholders for whatever the project's TestRail MCP actually exposes (resolved from config), so an agent maps them rather than expecting those literal names.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/scenarios-generation/references/testrail-format.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/testing/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Single Responsibility	Problem: The added `<implementation_modes>` block (lines 61-84) bolts three substantial new responsibilities onto the testing skill: UI impl mode, API impl mode, and a two-part Selector mode (identify + page-object authoring). The BASE skill was a focused 'write thorough isolated tests' skill; the PR now also makes it the owner of page-object/selector identification and API-spec-to-test implementation. That pushes the skill from 1-2 responsibilities toward four distinct workflows. Reason: A skill carrying four loosely-related modes is harder to load lazily and dilutes the single-responsibility contract the schema favors. Solution: Confirm these impl modes belong in `testing` rather than a dedicated implementation/selector skill; if kept, scope the frontmatter/role so the added responsibilities are declared, or split selector identification into its own skill.
🔵 Medium	Conflict Resolution	Problem: The new modes assert the PHASE is SSoT for paths/taxonomy/contract/read-write boundary/iteration cap, while the resident Quality bar still hardcodes absolutes (>=80% coverage, 1s timeout, mock-external-only). When a phase supplies a different coverage/assertion taxonomy, it is not stated which wins — the canonical quality bar or the phase binding. Reason: Two SSoT claims (phase bindings vs canonical quality bar) touch overlapping territory (coverage, assertion taxonomy) without an explicit tiebreak, risking inconsistent agent behavior. Solution: Add one line stating precedence: phase bindings govern paths/taxonomy/output contract; the canonical Quality bar and Mocking policy remain non-negotiable unless the phase explicitly overrides a named item. Resolve the implicit overlap between 'PHASE is SSoT' and 'rules below are canonical'.
🔵 Medium	Rosetta	Problem: The added General method line embeds an imperative skill-to-skill call inside the skill body: `match the repository's existing patterns (USE SKILL` coding `standards-first mode ...)`. Per pa-hardening the skills-can't-call-skills boundary discourages a skill imperatively invoking a sibling skill from within its procedure; references to sibling skills are normally surfaced as recommendations, not inline USE SKILL directives in the method steps. Reason: An imperative USE SKILL inside the skill body couples this skill to a sibling skill's internal mode name ('standards-first mode'), which is sibling-internals awareness the boundary rule warns against. Solution: Demote the inline `USE SKILL coding` to a reference/recommendation (the section already lists `skill coding — standards-first mode`), or phrase as 'apply repo conventions per the coding standards-first guidance' without an imperative invoke inside the mode procedure, preserving the no-sibling-call boundary.

📊 Gates Comparison

Gate	Score	Comparison
Single Responsibility	3	⬇️ Slightly worse
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/testing/references/implementation-examples.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Dependency Management	Problem: The new file bakes in specific framework/tool names (pytest, Jest, JUnit 5 + RestAssured, MUI class names) as full code blocks rather than parameterized shapes. Reason: pa-rosetta requires coding-agent-agnostic prompts, but a skill asset of worked examples legitimately shows concrete language samples since the calling phase owns the real binding and the file labels them non-authoritative. Solution: This is acceptable as-is because line 3 and line 113 explicitly frame them as 'shape references only' that the agent must 'adapt to the project's existing patterns'. No change required for behavior; if tightening, keep the per-language examples but reinforce the agnostic disclaimer once near the top.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/workflows/aqa-flow-code-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	✅ Much better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/workflows/aqa-flow-data-collection.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Rosetta	Problem: The frontmatter description (line 3) still reads 'Data Collection from TestRail and Confluence' — naming the two vendors as fixed — while the rewritten body (lines 19-22) makes vendors config-resolved and explicitly NOT hardcoded. Reason: pa-rosetta requires coding-agent/vendor-agnostic prompts and frontmatter as a dense call-to-action; the description contradicts the body's own non-hardcoding rule, a minor consistency defect. Solution: Align the description with the new config-resolved model, e.g. 'Phase 1 of AQA workflow - collect test-case + feature context via configured TMS/documentation vendors', to remove the hardcoded-vendor mismatch.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow-requirements-clarification.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Rosetta	Problem: The new frontmatter description is oversized and packs implementation detail: 'Phase 2 of AQA workflow - Requirements Clarification (gap-filling questioning) and Assertion Transcription (derives typed assertions via the requirements-use gap_analysis mode and writes them to the test plan as a mandatory list) - USER INTERACTION REQUIRED'. This is ~55-60 tokens, well over the pa-hardening <30-token call-to-action target, and leaks internal mechanics (skill name, mode name, step references) into the discovery surface. Reason: pa-hardening mandates frontmatter description be a small dense call-to-action (<30 tokens); the discovery shell should not carry per-step implementation internals. Solution: Shorten the description to a dense call-to-action under 30 tokens, e.g. 'Phase 2 of AQA - clarify requirements and define explicit typed assertions (USER INTERACTION REQUIRED)'. Move the gap_analysis/requirements-use mechanics into the body (already present in <workflow_context>).

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow-selector-identification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	⬆️ Slightly better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow-selector-implementation.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Rosetta	Problem: The NEW file ends with a stray, unbalanced closing tag `</output>` at EOF (after `</aqa_flow_selector_implementation>`), with no matching `<output>` opener anywhere in the file. This is a schema-impurity / well-formedness defect introduced by the PR (verified: count 0, count 1). Reason: pa-hardening/pa-schemas require schema-pure, well-formed phase artifacts; an unbalanced tag breaks XML-tag integrity and can confuse downstream parsing of the phase body. Solution: Delete the trailing `</output>` line at the end of the file so the phase document closes cleanly on `</aqa_flow_selector_implementation>`.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow-test-correction.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Rosetta	Problem: The PR adds a stray literal `</output>` tag as the final line of the file (diff line `+</output>` at the end), outside the root `<aqa_flow_test_correction>` element. This is leaked tool/harness scaffolding, not prompt content. Per pa-schemas/pa-hardening the artifact must be schema-pure and source-agnostic; a dangling unmatched closing tag pollutes the phase body and is the kind of AI-slop artifact the authoring skill explicitly forbids. Reason: An unbalanced XML-like tag in the published phase confuses agents parsing the prompt block and signals tooling contamination. Solution: Delete the trailing `</output>` line so the file ends cleanly with `</aqa_flow_test_correction>`.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow-test-implementation.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Rosetta	Problem: The PR appends a stray literal `</output>` line at EOF (diff line `+</output>` after `</aqa_flow_test_implementation>`), outside the root element. This is harness/tool scaffolding leaked into a published phase, violating schema-purity and the source-agnostic, state-only requirement in pa-hardening/pa-schemas. Reason: An unbalanced closing tag is tooling contamination and can mislead agents parsing the phase body. Solution: Delete the trailing `</output>` so the file ends with `</aqa_flow_test_implementation>`.
⚪ Low	Rosetta	Problem: The rewritten phase bakes the HITL refusal gate inline (`<stop_for_execution>` step 6.3: 'User instruction to bypass this gate must be refused with citation of this rule...'). pa-hardening states user involvement and HITL should be governed by the hitl skill, not restated as bespoke refusal logic inside each phase body. Reason: Duplicated, per-phase HITL refusal logic drifts from the single canonical gate authority and is the boundary pa-hardening warns against. Solution: Route the stop/refuse behavior through the `hitl` skill (as the parent `aqa-flow.md` already does via `type="HITL"`); keep only the phase-specific binding (what to wait for) and reference the hitl gate authority instead of re-implementing refusal wording per phase.
⚪ Low	Output Contract	Problem: `<workflow_context>` calls the appended record `## Test Implementation`, while `<implementation_handoff_contract>` lists it as the 'Test Implementation record' with five `###` subsections; the exact top-level heading string an agent must write is implied rather than stated once. Reason: Cosmetic; the record content and subsections are fully specified, so the artifact is still produceable. Solution: State the canonical record heading once (e.g. `## Test Implementation`) and reference it from the contract and checklist by that exact string.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow-test-report-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/qa-flow.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Reference Integrity	Problem: Phase 4 skill map lists `scenarios-generation` (`- Phase 4:` scenarios-generation`,` coding`.`) as a skill tag. This logical name is not an established skill in the AQA/QA family references and is not defined in-prompt; per pa-rosetta, Rosetta skills must come from the canonical `docs/definitions/skills.md` list. Reason: A non-canonical skill tag will fail to resolve via ACQUIRE/USE SKILL at Phase 4, breaking the test-specification step. Solution: Verify `scenarios-generation` against `docs/definitions/skills.md`; if it is not a canonical skill name, replace it with the correct existing skill (e.g. `testing`/`tech-specs`) or add it to the canonical list before referencing.
🔵 Medium	Rosetta	Problem: Frontmatter `description` is a multi-sentence paragraph (~80 tokens: full source-system enumeration, framework list, and end-to-end pipeline recap). pa-hardening requires the frontmatter description be a call-to-action that is extremely small and dense (<30 tokens). Reason: The description field is the routing/selection signal loaded into every agent context; an oversized paragraph wastes the cached-token budget and dilutes the matching cue. Solution: Compress `description` to a single dense call-to-action trigger (e.g. 'MUST apply for backend API test automation: spec analysis → implementation → execution → corrections.'); move the tool/source enumeration into the body where it is already restated.
🔵 Medium	Rosetta	Problem: `<references>` lists a Phase 4 skill `scenarios-generation` and the parent maps phases to files like `qa-flow-test-case-specification.md`, `qa-flow-gap-and-requirements-clarification.md`, and `qa-flow-execution-and-report-analysis.md`. Per pa-rosetta, Rosetta prompts must reference only canonical names from `docs/definitions/.md`; whether `scenarios-generation` and these phase files are canonical cannot be confirmed from the workflow alone. Reason:* Non-canonical logical names cause zero-document ACQUIREs at runtime; this is a name-hygiene risk, not a structural break, so medium severity. Solution: Verify each referenced skill and phase-file name against `docs/definitions/skills.md` and `docs/definitions/workflows.md`; align names or add the missing canonical entries.
🔵 Medium	Bloat Control	Problem: The frontmatter `description` is a long multi-sentence enumeration (TestRail/Jira, pytest/Jest/JUnit/RestAssured/SuperTest, plus a full second paragraph restating sources/tools). pa-hardening requires the frontmatter description to be a dense call-to-action under ~30 tokens; this one is roughly 90+ tokens and duplicates the `<description_and_purpose>` body (which itself defers to the frontmatter). Reason: Over-long frontmatter inflates every routing decision's token cost and violates the <30-token description rule; behavior is unaffected so this is medium severity. Solution: Trim the `description` to a short call-to-action (e.g. 'MUST apply for backend API test-automation tasks: write/extend/debug API tests from test cases and specs.'); keep the tool/source enumeration in the body, not the frontmatter.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/workflows/qa-flow-api-spec-analysis.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Cognitive Budget	Problem: At ~14.4K chars this single phase file carries a full endpoint-contract template, a complete worked example, a redaction catalog with a grep re-scan list, an Analysis Summary block, and a validation checklist all inline. It is the largest of the seven files and exceeds the pa-hardening size guidance (300-500 lines ideal; split when larger), concentrating multiple heavy sub-contracts into one phase load. Reason: The whole template + example + redaction catalog is resent in context every turn the phase runs; progressive disclosure of the large static catalog reduces the per-turn cognitive/token load without losing the contract. Solution: Move the verbatim `<endpoint_contract_template>` worked example and/or the `<redaction_contract>` catalog to a referenced asset the skill ACQUIREs on demand, keeping the phase file to the section list + binding + checklist (progressive disclosure).

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/qa-flow-data-collection.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Rosetta	Problem: This phase (`baseSchema: docs/schemas/phase.md`) ACQUIREs and executes a sibling phase: step 1.2b.2 `ACQUIRE qa-flow-documentation-mcp-subflow.md FROM KB` and 1.2b.4 'execute all numbered steps inside `<execute_documentation_mcp>`'. The target file is also a phase (`baseSchema: docs/schemas/phase.md`), and the parent `qa-flow.md` does not reference it. pa-hardening boundary: phases cannot call phases; only the parent workflow composes phases. Reason: Phase-calls-phase violates the Rosetta composition boundary; a sibling phase invoking another phase breaks progressive-disclosure ownership and creates a hidden control-flow dependency the orchestrator/workflow does not see. Solution: Either (a) promote the documentation-MCP collection into a step owned by `qa-flow.md` (the workflow), or (b) make the MCP-collection content a skill asset/reference ACQUIRE'd by the `discovery` skill rather than a sibling phase file, so data-collection no longer invokes another phase.
🔵 Medium	Reference Integrity	Problem: `<raw_data_contract>` Backend Source Code Analysis references `RefSrc/` docs, but the canonical Rosetta term is `refsrc/` (lowercase, per pa-rosetta target-folder list). The same file uses `refsrc/{project-name}/docs/` correctly elsewhere (step 2.1 in the sibling api-spec file), so the capitalization is inconsistent within the family. Reason: Inconsistent casing of a path/term reference can cause an agent to look up a non-existent directory; one term per concept is required. Solution: Change `RefSrc/` to the canonical `refsrc` term to match the pa-rosetta folder reference and the rest of the qa-flow family.
⚪ Low	Reference Integrity	Problem: The phase references vendor binding files loaded inside `discovery` (`references/<vendor>-binding.md`, `references/confluence-binding.md`) and the subflow tag `qa-flow-documentation-mcp-subflow`. These resolve only if those binding files and the subflow file exist in the KB; they are valid in-family references but unverifiable from this file alone. Reason: In-family references are valid by design; this is a low-severity reminder to confirm the targets ship together. Solution: Ensure `references/testrail-binding.md`, `references/jira-binding.md`, `references/confluence-binding.md` exist under the `discovery` skill and the subflow file is published.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/workflows/qa-flow-documentation-mcp-subflow.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Rosetta	Problem: This file is authored as a phase (`baseSchema: docs/schemas/phase.md`) but functions as a sub-phase that the sibling phase `qa-flow-data-collection.md` ACQUIREs and runs (its `<execute_documentation_mcp>` steps are executed by the parent phase). It carries `step="1.2b"`, i.e. it is a numbered step of another phase, not a standalone phase, and the workflow `qa-flow.md` does not list it. This is the phase-cannot-call-phase boundary from the other side. Reason: Being a phase-schema file invoked by a sibling phase violates the Rosetta composition boundary and gives it sibling/reverse awareness (it names its parent phase `qa-flow-data-collection`), which phases must not have. Solution: Re-home this fragment as a `discovery` skill asset/reference (so the collecting phase calls the skill, not a sibling phase), or fold its steps directly into `qa-flow.md`/the data-collection step under the parent workflow. If kept separate, it must not use the phase schema while being invoked by another phase.
⚪ Low	Reference Integrity	Problem: Relies on `discovery` loading `references/confluence-binding.md` and on `qa-project-config.md` config keys; these resolve only when those files ship in the KB / target structure. Reason: In-family/target-structure references are valid by design; low-severity confirmation only. Solution: Confirm `references/confluence-binding.md` exists under the `discovery` skill and that the documented config keys match the `qa-project-config` template.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/workflows/qa-flow-project-config-loading.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Bloat Control	Problem: The phase file is large and heavily redundant about the project-config-is-project-wide-not-per-IDENTIFIER point. It is restated in `<description_and_purpose>`, twice in `<workflow_context>` Output, in `<session_layout>` prose, in step 0.1 step 4, and again in the `<config_contract>` / template prose. The full per-phase state checklist is also reproduced in the State-file initial stub even though `<workflow_context>` and step 0.1.3 explicitly say the full schema is owned by `qa-flow.md` `<state_file>`. Reason: Every duplicated invariant is resent each agent turn at full token cost; the checklist duplication also risks the two copies drifting out of sync. Solution: State the project-wide-not-per-IDENTIFIER rule once in `<session_layout>` and reference it; trim the State-file stub to the minimal seed header plus an IDENTIFIER line rather than reproducing the full 8-row checklist that `qa-flow.md` owns.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/qa-flow-gap-and-requirements-clarification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/qa-flow-test-case-specification.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Rosetta	Problem: The `<present_for_approval>` step 4.4 defines a closed approval-token list (`approved`/`approve`/`yes`), loose-phrasing rejection, and max-retry escalation entirely inline, without binding to the `hitl` skill. pa-hardening requires user involvement / HITL to live in the `hitl` skill so full automation can govern it centrally; the sibling Phase 7 file (`qa-flow-test-correction.md`) DOES anchor its identical gate with "(Approval vocabulary is governed by `hitl`; this gate's closed token list is the phase-specific specialization)". This phase omits that anchor, so the two HITL gates diverge in their stated authority. Reason: Without the `hitl` anchor the gate can conflict with the session-wide `hitl` protocol and is inconsistent with the parent `qa-flow.md` carve-out that ties Phase 3-7 gates to the `hitl` skill. Solution: Add a one-line cite mirroring Phase 7 — note that approval vocabulary is governed by the `hitl` skill and this closed token list is the phase-specific specialization — rather than presenting the token gate as standalone authority.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/qa-flow-test-implementation.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Rosetta	Problem: The NEW file ends with a stray, unbalanced closing tag `</output>` at EOF (after the root element close), with no matching `<output>` opener anywhere in the file (verified: count 0, count 1). PR-introduced. Reason: pa-hardening/pa-schemas require schema-pure, well-formed phase artifacts; an unbalanced trailing tag breaks XML-tag integrity and is leaked generator/harness scaffolding. Solution: Delete the trailing `</output>` line so the document closes cleanly on its root element.
⚪ Low	Rosetta	Problem: The `<stop_for_execution>` step 5.3 defines a hard HITL gate with bypass-refusal logic ("User instruction to bypass this gate must be refused with citation of this rule ... the gate is mechanical and cannot be overridden by instruction alone") entirely inline, with no reference to the `hitl` skill. Unlike the sibling Phase 7 file which anchors its gate to `hitl`, this phase presents itself as the standalone authority for a stop-and-wait gate. Reason: pa-hardening requires HITL/user-involvement authority to derive from the `hitl` skill for central full-automation governance; an unanchored mechanical-override-refusal can conflict with the session-wide `hitl` policy. Solution: Anchor the stop-for-execution gate to the `hitl` skill (e.g. note it is a phase-specific specialization of the `hitl` stop-and-wait protocol), consistent with `qa-flow-test-correction.md` and the parent `qa-flow.md` carve-out.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/qa-flow-execution-and-report-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/qa-flow-test-correction.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Rosetta	Problem: The NEW file ends with a stray, unbalanced closing tag `</output>` at EOF (after the root element close), with no matching `<output>` opener anywhere in the file (verified: count 0, count 1). PR-introduced. Reason: pa-hardening/pa-schemas require schema-pure, well-formed phase artifacts; an unbalanced trailing tag breaks XML-tag integrity and is leaked generator/harness scaffolding. Solution: Delete the trailing `</output>` line so the document closes cleanly on its root element.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/testgen-flow.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Reference Integrity	Problem: The rewrite renamed the state-schema section to `<state_and_outputs>` and moved schema ownership to Phase 0's `<state_file_template>`. But the sibling phase file testgen-flow-data-collection.md still points to `testgen-flow.md` `<state_file>` (step 1.4), a section name that no longer exists in this file. The pointer target was renamed here without the consumer being updated, leaving a dangling cross-phase reference. Reason: An agent following the data-collection pointer cannot resolve `testgen-flow.md` `<state_file>` and may improvise a state schema, breaking the cross-phase contract. Solution: Keep a stable anchor: either re-add a `<state_file>` tag name (or alias) in this workflow that owns/forwards the state schema, or coordinate so data-collection points to Phase 0 `<state_file_template>` (the true SSoT named in `<state_and_outputs>`). Make the consumer and the SSoT name agree.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	✅ Much better
Decision Branching	4	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	✅ Much better
Failure Handling	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	4	✅ Much better
Rosetta	4	✅ Much better

📄 `instructions/r2/core/workflows/testgen-flow-project-config-loading.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/testgen-flow-data-collection.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Reference Integrity	Problem: Step 1.4 (`<update_state>`) instructs to update state 'per the parent flow's canonical state-file schema (declared once in `testgen-flow.md` `<state_file>`)'. The rewritten parent testgen-flow.md has no `<state_file>` section — its state section is `<state_and_outputs>`, which delegates ownership to Phase 0's `<state_file_template>` in testgen-flow-project-config-loading.md. The named anchor does not resolve. Reason: An agent grepping for `testgen-flow.md` `<state_file>` finds nothing and may invent a state schema, diverging from the Phase 0 template every other phase relies on. Solution: Point step 1.4 at the real SSoT: Phase 0 `<state_file_template>` in `testgen-flow-project-config-loading.md` (or the parent's `<state_and_outputs>` section name), matching what the parent actually declares.
⚪ Low	Example Grounding	Problem: Frontmatter description was shortened to 'Phase 1 of Test Generation - Data collection ' (trailing space, and dropped the 'from Jira and Confluence' specificity present in BASE). The body still relies on config-resolved vendors, so the description no longer signals the concrete sources the phase handles. Reason: Description is the call-to-action surface; losing the source hint slightly weakens phase identification, though body content remains complete. Solution: Restore a concise source hint in the description (e.g. 'Data collection from issue tracker + documentation sources') and trim the trailing space.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	2	⬇️ Slightly worse
Structural Coherence	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/testgen-flow-gap-and-contradiction-analysis.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Rosetta	Problem: Cross-family deep link into skill internals at line 41 (`requirements-use/references/gap-analysis-catalogs.md`) violates the pa-rosetta/pa-hardening rule 'no cross-skill deep linking to private content' and 'references must be wrapped in commands or ACQUIRE'd', not named as bare filesystem paths. Reason: Naming a skill's private reference path from a consuming phase is the boundary violation the Rosetta isolation rules exist to prevent. Solution: Invoke the skill by logical name and remove the bare internal path; if the catalog must be cited, wrap it as an ACQUIRE owned by the skill, not the phase.
🔵 Medium	Reference Integrity	Problem: Step 3 of <run_analysis> (line 41) names another skill's private file path `requirements-use/references/gap-analysis-catalogs.md` directly. The phase reaches into the skill's internal implementation instead of just invoking USE SKILL `requirements-use` and letting the skill own which catalog it loads. Reason: Coupling a phase to a skill's private reference filename breaks skill folder isolation and will silently break if the skill reorganizes its references. Solution: Drop the explicit `references/gap-analysis-catalogs.md` path; reference the gap_analysis mode by name only and let the skill resolve its own internal references.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/testgen-flow-question-generation.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Rosetta	Problem: The primary HITL gate (step 3.2 line 108 `PAUSE — WAIT FOR USER INPUT`, plus <workflow_context> line 20) is expressed as inline gate prose and never routes through `USE SKILL hitl`. The sibling Phase 0 (project-config-loading step 0.6) routes its gate through `USE SKILL hitl`. pa-rosetta/pa-hardening require HITL/approval to live in the canonical `hitl` home, so the most important HITL gate of the whole flow is inconsistent with its own sibling. Reason: An inline-only approval gate diverges from the canonical HITL skill and from the sibling phase, so approval-handling behavior is inconsistent at the single most safety-relevant gate in the flow. Solution: Gate the Phase 3 answer-wait/approval via `USE SKILL hitl` the same way Phase 0 step 0.6 does, keeping the inline prose only as the on-load-failure fallback.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/testgen-flow-requirements-document-generation.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Bloat Control	Problem: The <failure_handling> block adds a large non-operational meta-justification 'Conscious tradeoff — why no inline per-entry fallback (declared once, not re-derived per turn)' (lines 143-149) explaining WHY there is no fallback, plus rationale about the sibling test-case-generation phase. This is provenance/rationale prose, not an instruction the agent executes — exactly the non-operational meta-note pa-hardening says to remove. Reason: Non-operational rationale re-sent every turn inflates cost and cognitive load without changing agent behavior; pa-hardening flags exactly this class of meta-note. Solution: Keep only the operational rule ('skill is a hard dependency; on failure re-invoke once then block — no inline fallback'); delete the multi-bullet rationale and the sibling-comparison paragraph.
🔵 Medium	Rosetta	Problem: Cross-family deep linking into skill internals at lines 47 and 146 (`requirements-authoring/references/authoring-catalogs.md`, skill SKILL.md deploy path), plus the long non-operational tradeoff note (lines 143-149), both violate pa-rosetta/pa-hardening (no cross-skill deep linking; remove non-operational meta-notes / change-rationale). Reason: These are the two specific Rosetta authoring violations (skill-internal deep link + non-operational provenance) that the hardening reference calls out by name. Solution: Reference the skill by logical name only and strip the rationale paragraph, leaving the one-line operational block-on-failure rule.
🔵 Medium	Cognitive Budget	Problem: Step 4.3 restates the full phase-owned section contract, testgen-specific Executive Summary block, Traceability block, SMART exemplar, and coverage prompt (lines 45-118) while also delegating to the synthesis mode that owns the same schemas. The duplicated contract+rationale enlarges the per-turn surface area for a single document-build step. Reason: Carrying both the full contract and the rationale for the contract in one step pushes the phase toward the upper size band and competes for the agent's attention budget. Solution: Compress step 4.3 to the section table plus the two testgen-only deltas (Executive Summary, Traceability column); move worked SMART/coverage examples to a single short pointer to the skill catalog rather than inlining them.
🔵 Medium	Reference Integrity	Problem: Step 4.3 (line 47) and the failure_handling tradeoff (line 146) name another skill's private internals (`requirements-authoring/references/authoring-catalogs.md` and the skill's deploy path `instructions/<release>/core/skills/requirements-authoring/SKILL.md`) as bare paths. The phase reaches into skill-private files instead of invoking the skill by name and letting it own its references. Reason: Hard-coding a skill's internal reference filename and deploy path into a phase breaks skill isolation and will drift if the skill is reorganized. Solution: Reference `requirements-authoring` synthesis mode by logical name only; remove the `references/authoring-catalogs.md` and explicit deploy-path citations.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	3	⬇️ Slightly worse
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/testgen-flow-test-case-generation.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Precision & Explicitness	Problem: The PR's stated direction is de-hardcoding the vendor to a config-resolved TMS binding (step 5.3 line 84), yet residual hardcoded 'TestRail' remains in operative text: phase_steps line 25 'Generate test cases in TestRail format' and the user-facing message line 281 'Ready to proceed to Phase 6 (TestRail Export)?'. The same concept (the resolved TMS vendor) is named two ways, violating one-term-per-concept. Reason: A step that says 'TestRail format' and a user message that says 'TestRail Export' contradict the config-resolved-vendor model the same file establishes, and can mislead the agent on non-TestRail projects. Solution: Change line 25 to 'Generate test cases in the resolved TMS FORMAT' and line 281 to 'Phase 6 (Test Case Export)'; keep `testrail` only where it is an explicit example of a resolvable binding.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/testgen-flow-test-case-export.md`

⚠️ Issues Found

Severity	Gate	Details
🟠 Very High	Safety Boundaries	Problem: This is the destructive phase (writes to an external TMS). Line 36 ASSERTS the phase owns 'idempotency (the destructive-write confirmation gate + dedup pre-scan)', but no such gate or pre-scan exists anywhere in steps 6.1-6.6. Step 6.3 only asks for a target location; step 6.5 exports every case directly with no pre-write confirmation and no duplicate check. The duplicate risk is only passively acknowledged as a pitfall (line 143 'Re-running export may create duplicates in TMS — document this behavior'). On a rerun the phase will silently re-create every case. Reason: The phase claims a destructive-write safety control it never implements; an unguarded rerun duplicates the entire suite in the external system with no confirmation. Solution: Add an explicit step before step 6.5: dedup pre-scan of the target location for already-exported TC IDs/titles, then a destructive-write confirmation gate (route via `USE SKILL hitl`) that shows the user the create/skip plan and requires explicit confirmation before any TMS write. Make the line-36 claim point at that real step.
🟡 High	Precision & Explicitness	Problem: Line 36 uses the precise terms 'destructive-write confirmation gate' and 'dedup pre-scan' as if they are defined controls, but neither term is defined or operationalized later in the file, so the modal claim is non-actionable. The agent is told the phase OWNS these controls but is never told how to perform them. Reason: Naming a control the agent cannot locate or execute is an explicitness gap that makes the safety claim unenforceable. Solution: Either add the concrete steps these terms name, or remove the terms; do not assert ownership of a control that has no procedure.
🟡 High	Workflow Completeness	Problem: The parent workflow marks Phase 6 `type="HITL"` requiring the user to 'confirm export', and <workflow_context> line 19 lists HITL as 'user must provide target location' — but the step sequence has no operational confirm-export gate. The 'confirm export' obligation from the parent and the destructive-write gate named at line 36 are both missing from the numbered steps 6.1-6.9. Reason: A multi-step destructive workflow that omits the parent-mandated confirmation step has an implicit (missing) step at exactly the irreversible action. Solution: Insert a numbered confirm-export step between get_target_location (6.3) and export (6.5) that pauses for explicit user approval of the export scope and target, mirroring the parent's HITL contract.
🟡 High	Rosetta	Problem: The parent workflow designates Phase 6 a HITL gate, yet this phase routes user interaction (target location ask, partial-export decision) through plain inline `Ask user` prompts and never `USE SKILL hitl`. pa-rosetta/pa-hardening require HITL approval to live in the `hitl` skill; sibling Phase 0 already does this, so the family is inconsistent for its two declared HITL phases. Reason: HITL handled outside the `hitl` skill bypasses the session-wide approval protocol and is inconsistent across the phase family, weakening the guarantee that the user gates the destructive export. Solution: Route the export confirmation and the partial-export user decision through `USE SKILL hitl` (canonical approval/escalation home), matching Phase 0 step 0.6; keep the inline prompts only as the skill-load-failure fallback.
🔵 Medium	Bloat Control	Problem: Line 36 is a non-operational ownership/meta declaration ('This phase OWNS the export contract … idempotency …; the skill EMITS … it never decides the contract') that describes responsibility boundaries rather than instructing an action, while the action it claims (gate + dedup) is absent. It is meta-prose, not a step. Reason: An ownership claim that substitutes for the missing procedure adds words and a false sense of coverage without changing behavior. Solution: Replace the ownership paragraph with the actual operational steps (dedup scan + confirm gate); keep at most a one-line note of which artifact is the export source.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	2	⬇️ Slightly worse
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Single Responsibility	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r3/core/skills/coding/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/debugging/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Reference Integrity	Problem: The added `<when_to_use_skill>` line points the reader to the triage mode 'in `<process>`', but the new section is named `<test_execution_triage>` and the skill has no `<process>` block. The mental hook lands on a section that does not exist. Reason: A pointer to a non-existent tag makes the agent search for content it cannot find, weakening reliable routing into the new mode. Solution: Change 'use the test-execution triage mode in `<process>`' to point to `<test_execution_triage>` (the actual section tag).
🔵 Medium	Single Responsibility	Problem: The added `<test_execution_triage>` block introduces a second, distinct responsibility — read-only triage of an automated-test execution report with its own taxonomy, capture analysis, and cross-failure pattern detection — onto a skill whose core job is root-cause debugging of a single issue. The `<when_to_use_skill>` and frontmatter now advertise two jobs. Reason: Two responsibilities in one skill raises the chance an agent applies the wrong mode or loads triage machinery when only simple debugging is needed. Solution: Acceptable as a bounded mode if kept thin; if the triage mode grows further it should move to its own skill. For now ensure the triage mode stays a pointer-style specialization (it correctly references `<core_concepts>` for evidence labels and redaction) and does not accumulate independent process depth.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/orchestrator-contract/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Input Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r3/core/skills/reverse-engineering/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Single Responsibility	Problem: The added `<analysis_modes>` block introduces two concrete, domain-specific modes (test-automation architecture analysis, API-contract extraction) into a skill whose general purpose is code→spec reverse engineering. These modes carry their own GATEs, source-priority lists, and per-endpoint templates, broadening the skill beyond its single distillation responsibility. Reason: Each added concrete mode widens the skill's job count and the surface an agent must scan to apply the right one. Solution: Acceptable while the modes stay thin specializations that EMIT into the phase-owned artifact (they currently do, and defer artifact shape/path to the phase). Watch for further mode accretion; if more modes are added, extract them to a dedicated test/API-analysis skill.

📊 Gates Comparison

Gate	Score	Comparison
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/operation-manager/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Precision & Explicitness	Problem: core_concepts states the CLI as `npx rosettify@latest <command> <subcommand> <plan_file>` with the `<command>` slot left as a placeholder, but every concrete invocation in `<process>` and `<validation_checklist>` uses the literal command `plan` (e.g. `plan next`, `plan update_status`, `plan query`). The generic `<command>` placeholder is never bound to `plan`, so the one term for the command concept is presented two ways. Reason: A placeholder command that is never bound forces the agent to infer the command name, risking malformed invocations. Solution: State the command literally once (`npx rosettify@latest plan <subcommand> <plan_file>`) as plan-manager does, or explicitly define that `<command>` is always `plan` for this skill.

📊 Gates Comparison

Gate	Score	Comparison
Output Contract	5	✅ Much better
Decision Branching	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Failure Handling	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/discovery/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Rosetta	Problem: The frontmatter `description` (lines 3) is two full sentences (~55 tokens): 'Rosetta skill to gather source artifacts ... Use to collect issues/tickets, test cases, and documentation pages for downstream requirements, test design, or debugging phases.' pa-hardening requires frontmatter description be a call-to-action and extremely dense (<30 tokens). Reason: Frontmatter is loaded into every agent's context for skill selection; an over-long description wastes the always-resident budget and violates the Rosetta frontmatter density rule. Solution: Compress the description to a single dense call-to-action under 30 tokens, e.g. 'Collect + normalize + redact source-of-record artifacts (Jira/Confluence/TestRail via MCP) into the phase-defined raw-context artifact.'

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/skills/discovery/references/confluence-binding.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/discovery/references/jira-binding.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/discovery/references/testrail-binding.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/requirements-use/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Goal Specification	Problem: The PR adds a whole second responsibility, the `<gap_analysis>` analysis-only mode (multi-source contradiction/gap/ambiguity classification over Jira/Confluence/TestRail/API-spec/test-plan data), but neither the frontmatter `description` ("Consume approved requirements to drive planning, implementation, and validation...") nor `<when_to_use_skill>` ("implementing from approved requirements, planning work from requirement IDs, or auditing requirement-to-delivery traceability") mentions gap analysis. Skill routing is driven by description/when-to-use, so the new mode is undiscoverable by the dispatching agent. Reason: An entry mode that the trigger metadata never names will not be loaded for the cases it exists to serve. Solution: Add the gap_analysis mode to `<when_to_use_skill>` and to the frontmatter `description` (e.g. add a clause about analyzing collected multi-source data for gaps/contradictions/ambiguities) so the mode is selectable.
🟡 High	Single Responsibility	Problem: The PR bolts a whole new `<gap_analysis>` analysis-only mode (lines 93-105) onto a skill whose stated job (frontmatter + `<role>` line 24: 'using requirements as execution contract') is consuming approved requirements to drive planning/implementation/validation. Multi-source contradiction/gap/ambiguity detection across Jira/Confluence/TestRail/API-spec/test-plans is a distinct responsibility from requirement-to-delivery traceability, pushing the skill from 1-2 jobs toward 3. Reason: A skill description that advertises only requirement usage but silently contains a second analysis mode harms skill selection and violates the SRP/single-responsibility expectation; agents may load it for the wrong job or miss the mode entirely. Solution: Either keep the two responsibilities deliberately fused and state the dual-mode scope explicitly in the frontmatter/`<role>`/`<when_to_use_skill>` so callers understand both modes, or extract `<gap_analysis>` into its own analysis skill that the phase invokes separately.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	3	⬇️ Slightly worse
Single Responsibility	3	⬇️ Slightly worse
Output Contract	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/requirements-use/references/gap-analysis-catalogs.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/skills/requirements-authoring/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Single Responsibility	Problem: The PR adds a `<synthesis>` mode (synthesize multi-source Jira/Confluence/TestRail/answers/gap-analysis data into one structured requirements document) on top of the existing authoring/updating/reviewing responsibility. The frontmatter `description` was NOT updated to name synthesis (it still reads only "Author, update, and validate functional and non-functional requirements..."), though `<when_to_use_skill>` was updated to add "synthesizing". This is a milder version of the requirements-use split, and synthesis shares the authoring rules, so it is more cohesive — but description/when-to-use are now inconsistent about the mode set. Reason: Frontmatter drives skill selection; if it omits a mode that when-to-use and the body define, the mode may not be routed. Solution: Add synthesis to the frontmatter `description` so trigger metadata matches the `<when_to_use_skill>` and the `<synthesis>` body.
🔵 Medium	Workflow Completeness	Problem: Compression collapsed the explicit base `<authoring_flow>` (15 ordered bullets including 'Check against current best practices' and 'Once drafting is done proactively seek user approval') into a 3-step flow (lines 85-91). The proactive 'check against current best practices' step is no longer stated in SKILL.md or the catalogs. Reason: Dropping an explicit ordered step from a multi-step authoring flow risks the agent skipping the best-practices validation that the base flow enforced. Solution: Confirm the best-practices-check step is intentionally dropped or fold it back into step 2 of `<authoring_flow>` (e.g. 'run quality-gate + best-practices checks').
🔵 Medium	Precision & Explicitness	Problem: The base NFR rule 'Update existing requirements with new schema' (base `<nonfunctional_requirements>`) was dropped in the compression and is not recoverable in NEW SKILL.md (line 78 NFR clause) nor in references/authoring-catalogs.md (NFR schema section, lines 114-125). The instruction to migrate already-authored requirements onto the current schema is lost. Reason: Without the migrate-to-current-schema directive, the agent may leave legacy units on an outdated schema during updates, producing an inconsistent requirements set. Solution: Re-add a short clause (in `<requirement_statements>` NFR bullet or the catalogs schema-fields section) stating that existing requirement units must be updated to the current schema when re-authored.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/skills/requirements-authoring/references/authoring-catalogs.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Rosetta	Problem: Lines 51 and 49-55 restate the requirement schema fields and ID conventions, and the unit template (lines 9-28) duplicates the verbatim `<req>` template that also lives in the asset `ra-requirement-unit.xml`. The brief says SKILL.md owns rules/methods and this file holds reference catalogs, but the `<req>` template is now duplicated across this reference AND the asset, violating DRY/SSoT within the family. Reason: Duplicated canonical templates drift apart (already visible vs the XML asset), the exact failure mode the single-source convention prevents. Solution: Keep the verbatim `<req>` template in exactly one location (the asset) and have this catalog reference it, rather than copying the full template block.
🔵 Medium	Precision & Explicitness	Problem: The `<req>` unit template here (lines 9-28) uses the OLD two-field shape: `NotStarted

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	4	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/scenarios-generation/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Single Responsibility	Problem: The new skill carries three distinct authoring modes — gwt_spec (Given-When-Then API specs), generation (TMS-format cases), and a vendor_binding resolver — plus a shared validation checklist. That is broad for one skill; the gwt_spec mode and the TMS generation mode are quite different artifact shapes. Reason: Two related but materially different output artifacts (ATC GWT specs vs TMS Steps/Expected cases) increase the cognitive search space within a single resident prompt. Solution: Acceptable as one skill since all three are 'design test scenarios from requirements'; keep but ensure the mode boundary in <gwt_spec> vs stays crisp so an agent never blends an ATC spec with a TMS case template. No split required.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/scenarios-generation/references/gwt-spec.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/scenarios-generation/references/testrail-export.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Rosetta	Problem: Rosetta prompts are coding-agent-agnostic and avoid hardcoded tool names; this binding embeds concrete `mcp_testrail_` tool signatures (steps 1, 7, 8) as the operational contract. While a vendor binding legitimately names the vendor, the per-call MCP symbol form is a specific tool-name assumption rather than a 'tell how to think' resolution from project config. Reason:* Hardcoded tool symbols reduce agent-agnostic portability, which the Rosetta gate guards against. Solution: Frame the MCP signatures as the shape to invoke against the project-resolved TestRail MCP tool names, consistent with the SKILL's stance that the skill never reads config and the phase resolves the binding.
🔵 Medium	Dependency Management	Problem: MCP tool names are hardcoded throughout the process steps (`mcp_testrail_get_project`, `mcp_testrail_get_cases`, `mcp_testrail_add_case`). This is a vendor-specific export binding, so TestRail names are expected here, but the tool-call invocations assume one MCP server naming scheme; a TestRail MCP exposed under a different tool prefix in the target project would not match. The file does parameterize the vendor concept abstractly (the swap table at the end), but the live process steps bake the exact `mcp_testrail_` symbols. Reason:* Hardcoded MCP symbol names can silently mismatch a differently-named TestRail MCP, breaking the export at call time. Solution: State once near the top that the TestRail MCP tool symbols are placeholders for whatever the project's TestRail MCP actually exposes (resolved from config), so an agent maps them rather than expecting those literal names.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/scenarios-generation/references/testrail-format.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/testing/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Single Responsibility	Problem: The added `<implementation_modes>` block (lines 61-84) bolts three substantial new responsibilities onto the testing skill: UI impl mode, API impl mode, and a two-part Selector mode (identify + page-object authoring). The BASE skill was a focused 'write thorough isolated tests' skill; the PR now also makes it the owner of page-object/selector identification and API-spec-to-test implementation. That pushes the skill from 1-2 responsibilities toward four distinct workflows. Reason: A skill carrying four loosely-related modes is harder to load lazily and dilutes the single-responsibility contract the schema favors. Solution: Confirm these impl modes belong in `testing` rather than a dedicated implementation/selector skill; if kept, scope the frontmatter/role so the added responsibilities are declared, or split selector identification into its own skill.
🔵 Medium	Conflict Resolution	Problem: The new modes assert the PHASE is SSoT for paths/taxonomy/contract/read-write boundary/iteration cap, while the resident Quality bar still hardcodes absolutes (>=80% coverage, 1s timeout, mock-external-only). When a phase supplies a different coverage/assertion taxonomy, it is not stated which wins — the canonical quality bar or the phase binding. Reason: Two SSoT claims (phase bindings vs canonical quality bar) touch overlapping territory (coverage, assertion taxonomy) without an explicit tiebreak, risking inconsistent agent behavior. Solution: Add one line stating precedence: phase bindings govern paths/taxonomy/output contract; the canonical Quality bar and Mocking policy remain non-negotiable unless the phase explicitly overrides a named item. Resolve the implicit overlap between 'PHASE is SSoT' and 'rules below are canonical'.
🔵 Medium	Rosetta	Problem: The added General method line embeds an imperative skill-to-skill call inside the skill body: `match the repository's existing patterns (USE SKILL` coding `standards-first mode ...)`. Per pa-hardening the skills-can't-call-skills boundary discourages a skill imperatively invoking a sibling skill from within its procedure; references to sibling skills are normally surfaced as recommendations, not inline USE SKILL directives in the method steps. Reason: An imperative USE SKILL inside the skill body couples this skill to a sibling skill's internal mode name ('standards-first mode'), which is sibling-internals awareness the boundary rule warns against. Solution: Demote the inline `USE SKILL coding` to a reference/recommendation (the section already lists `skill coding — standards-first mode`), or phrase as 'apply repo conventions per the coding standards-first guidance' without an imperative invoke inside the mode procedure, preserving the no-sibling-call boundary.

📊 Gates Comparison

Gate	Score	Comparison
Single Responsibility	3	⬇️ Slightly worse
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/testing/references/implementation-examples.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Dependency Management	Problem: The new file bakes in specific framework/tool names (pytest, Jest, JUnit 5 + RestAssured, MUI class names) as full code blocks rather than parameterized shapes. Reason: pa-rosetta requires coding-agent-agnostic prompts, but a skill asset of worked examples legitimately shows concrete language samples since the calling phase owns the real binding and the file labels them non-authoritative. Solution: This is acceptable as-is because line 3 and line 113 explicitly frame them as 'shape references only' that the agent must 'adapt to the project's existing patterns'. No change required for behavior; if tightening, keep the per-language examples but reinforce the agnostic disclaimer once near the top.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/workflows/aqa-flow-code-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	✅ Much better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/workflows/aqa-flow-data-collection.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Rosetta	Problem: The frontmatter description (line 3) still reads 'Data Collection from TestRail and Confluence' — naming the two vendors as fixed — while the rewritten body (lines 19-22) makes vendors config-resolved and explicitly NOT hardcoded. Reason: pa-rosetta requires coding-agent/vendor-agnostic prompts and frontmatter as a dense call-to-action; the description contradicts the body's own non-hardcoding rule, a minor consistency defect. Solution: Align the description with the new config-resolved model, e.g. 'Phase 1 of AQA workflow - collect test-case + feature context via configured TMS/documentation vendors', to remove the hardcoded-vendor mismatch.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-requirements-clarification.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Rosetta	Problem: The new frontmatter description is oversized and packs implementation detail: 'Phase 2 of AQA workflow - Requirements Clarification (gap-filling questioning) and Assertion Transcription (derives typed assertions via the requirements-use gap_analysis mode and writes them to the test plan as a mandatory list) - USER INTERACTION REQUIRED'. This is ~55-60 tokens, well over the pa-hardening <30-token call-to-action target, and leaks internal mechanics (skill name, mode name, step references) into the discovery surface. Reason: pa-hardening mandates frontmatter description be a small dense call-to-action (<30 tokens); the discovery shell should not carry per-step implementation internals. Solution: Shorten the description to a dense call-to-action under 30 tokens, e.g. 'Phase 2 of AQA - clarify requirements and define explicit typed assertions (USER INTERACTION REQUIRED)'. Move the gap_analysis/requirements-use mechanics into the body (already present in <workflow_context>).

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-selector-identification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	⬆️ Slightly better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-selector-implementation.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Rosetta	Problem: The NEW file ends with a stray, unbalanced closing tag `</output>` at EOF (after `</aqa_flow_selector_implementation>`), with no matching `<output>` opener anywhere in the file. This is a schema-impurity / well-formedness defect introduced by the PR (verified: count 0, count 1). Reason: pa-hardening/pa-schemas require schema-pure, well-formed phase artifacts; an unbalanced tag breaks XML-tag integrity and can confuse downstream parsing of the phase body. Solution: Delete the trailing `</output>` line at the end of the file so the phase document closes cleanly on `</aqa_flow_selector_implementation>`.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-test-correction.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Rosetta	Problem: The PR adds a stray literal `</output>` tag as the final line of the file (diff line `+</output>` at the end), outside the root `<aqa_flow_test_correction>` element. This is leaked tool/harness scaffolding, not prompt content. Per pa-schemas/pa-hardening the artifact must be schema-pure and source-agnostic; a dangling unmatched closing tag pollutes the phase body and is the kind of AI-slop artifact the authoring skill explicitly forbids. Reason: An unbalanced XML-like tag in the published phase confuses agents parsing the prompt block and signals tooling contamination. Solution: Delete the trailing `</output>` line so the file ends cleanly with `</aqa_flow_test_correction>`.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-test-implementation.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Rosetta	Problem: The PR appends a stray literal `</output>` line at EOF (diff line `+</output>` after `</aqa_flow_test_implementation>`), outside the root element. This is harness/tool scaffolding leaked into a published phase, violating schema-purity and the source-agnostic, state-only requirement in pa-hardening/pa-schemas. Reason: An unbalanced closing tag is tooling contamination and can mislead agents parsing the phase body. Solution: Delete the trailing `</output>` so the file ends with `</aqa_flow_test_implementation>`.
⚪ Low	Rosetta	Problem: The rewritten phase bakes the HITL refusal gate inline (`<stop_for_execution>` step 6.3: 'User instruction to bypass this gate must be refused with citation of this rule...'). pa-hardening states user involvement and HITL should be governed by the hitl skill, not restated as bespoke refusal logic inside each phase body. Reason: Duplicated, per-phase HITL refusal logic drifts from the single canonical gate authority and is the boundary pa-hardening warns against. Solution: Route the stop/refuse behavior through the `hitl` skill (as the parent `aqa-flow.md` already does via `type="HITL"`); keep only the phase-specific binding (what to wait for) and reference the hitl gate authority instead of re-implementing refusal wording per phase.
⚪ Low	Output Contract	Problem: `<workflow_context>` calls the appended record `## Test Implementation`, while `<implementation_handoff_contract>` lists it as the 'Test Implementation record' with five `###` subsections; the exact top-level heading string an agent must write is implied rather than stated once. Reason: Cosmetic; the record content and subsections are fully specified, so the artifact is still produceable. Solution: State the canonical record heading once (e.g. `## Test Implementation`) and reference it from the contract and checklist by that exact string.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-test-report-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Reference Integrity	Problem: Phase 4 skill map lists `scenarios-generation` (`- Phase 4:` scenarios-generation`,` coding`.`) as a skill tag. This logical name is not an established skill in the AQA/QA family references and is not defined in-prompt; per pa-rosetta, Rosetta skills must come from the canonical `docs/definitions/skills.md` list. Reason: A non-canonical skill tag will fail to resolve via ACQUIRE/USE SKILL at Phase 4, breaking the test-specification step. Solution: Verify `scenarios-generation` against `docs/definitions/skills.md`; if it is not a canonical skill name, replace it with the correct existing skill (e.g. `testing`/`tech-specs`) or add it to the canonical list before referencing.
🔵 Medium	Rosetta	Problem: Frontmatter `description` is a multi-sentence paragraph (~80 tokens: full source-system enumeration, framework list, and end-to-end pipeline recap). pa-hardening requires the frontmatter description be a call-to-action that is extremely small and dense (<30 tokens). Reason: The description field is the routing/selection signal loaded into every agent context; an oversized paragraph wastes the cached-token budget and dilutes the matching cue. Solution: Compress `description` to a single dense call-to-action trigger (e.g. 'MUST apply for backend API test automation: spec analysis → implementation → execution → corrections.'); move the tool/source enumeration into the body where it is already restated.
🔵 Medium	Rosetta	Problem: `<references>` lists a Phase 4 skill `scenarios-generation` and the parent maps phases to files like `qa-flow-test-case-specification.md`, `qa-flow-gap-and-requirements-clarification.md`, and `qa-flow-execution-and-report-analysis.md`. Per pa-rosetta, Rosetta prompts must reference only canonical names from `docs/definitions/.md`; whether `scenarios-generation` and these phase files are canonical cannot be confirmed from the workflow alone. Reason:* Non-canonical logical names cause zero-document ACQUIREs at runtime; this is a name-hygiene risk, not a structural break, so medium severity. Solution: Verify each referenced skill and phase-file name against `docs/definitions/skills.md` and `docs/definitions/workflows.md`; align names or add the missing canonical entries.
🔵 Medium	Bloat Control	Problem: The frontmatter `description` is a long multi-sentence enumeration (TestRail/Jira, pytest/Jest/JUnit/RestAssured/SuperTest, plus a full second paragraph restating sources/tools). pa-hardening requires the frontmatter description to be a dense call-to-action under ~30 tokens; this one is roughly 90+ tokens and duplicates the `<description_and_purpose>` body (which itself defers to the frontmatter). Reason: Over-long frontmatter inflates every routing decision's token cost and violates the <30-token description rule; behavior is unaffected so this is medium severity. Solution: Trim the `description` to a short call-to-action (e.g. 'MUST apply for backend API test-automation tasks: write/extend/debug API tests from test cases and specs.'); keep the tool/source enumeration in the body, not the frontmatter.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/qa-flow-api-spec-analysis.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Cognitive Budget	Problem: At ~14.4K chars this single phase file carries a full endpoint-contract template, a complete worked example, a redaction catalog with a grep re-scan list, an Analysis Summary block, and a validation checklist all inline. It is the largest of the seven files and exceeds the pa-hardening size guidance (300-500 lines ideal; split when larger), concentrating multiple heavy sub-contracts into one phase load. Reason: The whole template + example + redaction catalog is resent in context every turn the phase runs; progressive disclosure of the large static catalog reduces the per-turn cognitive/token load without losing the contract. Solution: Move the verbatim `<endpoint_contract_template>` worked example and/or the `<redaction_contract>` catalog to a referenced asset the skill ACQUIREs on demand, keeping the phase file to the section list + binding + checklist (progressive disclosure).

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow-data-collection.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Rosetta	Problem: This phase (`baseSchema: docs/schemas/phase.md`) ACQUIREs and executes a sibling phase: step 1.2b.2 `ACQUIRE qa-flow-documentation-mcp-subflow.md FROM KB` and 1.2b.4 'execute all numbered steps inside `<execute_documentation_mcp>`'. The target file is also a phase (`baseSchema: docs/schemas/phase.md`), and the parent `qa-flow.md` does not reference it. pa-hardening boundary: phases cannot call phases; only the parent workflow composes phases. Reason: Phase-calls-phase violates the Rosetta composition boundary; a sibling phase invoking another phase breaks progressive-disclosure ownership and creates a hidden control-flow dependency the orchestrator/workflow does not see. Solution: Either (a) promote the documentation-MCP collection into a step owned by `qa-flow.md` (the workflow), or (b) make the MCP-collection content a skill asset/reference ACQUIRE'd by the `discovery` skill rather than a sibling phase file, so data-collection no longer invokes another phase.
🔵 Medium	Reference Integrity	Problem: `<raw_data_contract>` Backend Source Code Analysis references `RefSrc/` docs, but the canonical Rosetta term is `refsrc/` (lowercase, per pa-rosetta target-folder list). The same file uses `refsrc/{project-name}/docs/` correctly elsewhere (step 2.1 in the sibling api-spec file), so the capitalization is inconsistent within the family. Reason: Inconsistent casing of a path/term reference can cause an agent to look up a non-existent directory; one term per concept is required. Solution: Change `RefSrc/` to the canonical `refsrc` term to match the pa-rosetta folder reference and the rest of the qa-flow family.
⚪ Low	Reference Integrity	Problem: The phase references vendor binding files loaded inside `discovery` (`references/<vendor>-binding.md`, `references/confluence-binding.md`) and the subflow tag `qa-flow-documentation-mcp-subflow`. These resolve only if those binding files and the subflow file exist in the KB; they are valid in-family references but unverifiable from this file alone. Reason: In-family references are valid by design; this is a low-severity reminder to confirm the targets ship together. Solution: Ensure `references/testrail-binding.md`, `references/jira-binding.md`, `references/confluence-binding.md` exist under the `discovery` skill and the subflow file is published.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/qa-flow-documentation-mcp-subflow.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Rosetta	Problem: This file is authored as a phase (`baseSchema: docs/schemas/phase.md`) but functions as a sub-phase that the sibling phase `qa-flow-data-collection.md` ACQUIREs and runs (its `<execute_documentation_mcp>` steps are executed by the parent phase). It carries `step="1.2b"`, i.e. it is a numbered step of another phase, not a standalone phase, and the workflow `qa-flow.md` does not list it. This is the phase-cannot-call-phase boundary from the other side. Reason: Being a phase-schema file invoked by a sibling phase violates the Rosetta composition boundary and gives it sibling/reverse awareness (it names its parent phase `qa-flow-data-collection`), which phases must not have. Solution: Re-home this fragment as a `discovery` skill asset/reference (so the collecting phase calls the skill, not a sibling phase), or fold its steps directly into `qa-flow.md`/the data-collection step under the parent workflow. If kept separate, it must not use the phase schema while being invoked by another phase.
⚪ Low	Reference Integrity	Problem: Relies on `discovery` loading `references/confluence-binding.md` and on `qa-project-config.md` config keys; these resolve only when those files ship in the KB / target structure. Reason: In-family/target-structure references are valid by design; low-severity confirmation only. Solution: Confirm `references/confluence-binding.md` exists under the `discovery` skill and that the documented config keys match the `qa-project-config` template.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/qa-flow-execution-and-report-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow-gap-and-requirements-clarification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow-project-config-loading.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Bloat Control	Problem: The phase file is large and heavily redundant about the project-config-is-project-wide-not-per-IDENTIFIER point. It is restated in `<description_and_purpose>`, twice in `<workflow_context>` Output, in `<session_layout>` prose, in step 0.1 step 4, and again in the `<config_contract>` / template prose. The full per-phase state checklist is also reproduced in the State-file initial stub even though `<workflow_context>` and step 0.1.3 explicitly say the full schema is owned by `qa-flow.md` `<state_file>`. Reason: Every duplicated invariant is resent each agent turn at full token cost; the checklist duplication also risks the two copies drifting out of sync. Solution: State the project-wide-not-per-IDENTIFIER rule once in `<session_layout>` and reference it; trim the State-file stub to the minimal seed header plus an IDENTIFIER line rather than reproducing the full 8-row checklist that `qa-flow.md` owns.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow-test-case-specification.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Rosetta	Problem: The `<present_for_approval>` step 4.4 defines a closed approval-token list (`approved`/`approve`/`yes`), loose-phrasing rejection, and max-retry escalation entirely inline, without binding to the `hitl` skill. pa-hardening requires user involvement / HITL to live in the `hitl` skill so full automation can govern it centrally; the sibling Phase 7 file (`qa-flow-test-correction.md`) DOES anchor its identical gate with "(Approval vocabulary is governed by `hitl`; this gate's closed token list is the phase-specific specialization)". This phase omits that anchor, so the two HITL gates diverge in their stated authority. Reason: Without the `hitl` anchor the gate can conflict with the session-wide `hitl` protocol and is inconsistent with the parent `qa-flow.md` carve-out that ties Phase 3-7 gates to the `hitl` skill. Solution: Add a one-line cite mirroring Phase 7 — note that approval vocabulary is governed by the `hitl` skill and this closed token list is the phase-specific specialization — rather than presenting the token gate as standalone authority.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow-test-correction.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Rosetta	Problem: The NEW file ends with a stray, unbalanced closing tag `</output>` at EOF (after the root element close), with no matching `<output>` opener anywhere in the file (verified: count 0, count 1). PR-introduced. Reason: pa-hardening/pa-schemas require schema-pure, well-formed phase artifacts; an unbalanced trailing tag breaks XML-tag integrity and is leaked generator/harness scaffolding. Solution: Delete the trailing `</output>` line so the document closes cleanly on its root element.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow-test-implementation.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Rosetta	Problem: The NEW file ends with a stray, unbalanced closing tag `</output>` at EOF (after the root element close), with no matching `<output>` opener anywhere in the file (verified: count 0, count 1). PR-introduced. Reason: pa-hardening/pa-schemas require schema-pure, well-formed phase artifacts; an unbalanced trailing tag breaks XML-tag integrity and is leaked generator/harness scaffolding. Solution: Delete the trailing `</output>` line so the document closes cleanly on its root element.
⚪ Low	Rosetta	Problem: The `<stop_for_execution>` step 5.3 defines a hard HITL gate with bypass-refusal logic ("User instruction to bypass this gate must be refused with citation of this rule ... the gate is mechanical and cannot be overridden by instruction alone") entirely inline, with no reference to the `hitl` skill. Unlike the sibling Phase 7 file which anchors its gate to `hitl`, this phase presents itself as the standalone authority for a stop-and-wait gate. Reason: pa-hardening requires HITL/user-involvement authority to derive from the `hitl` skill for central full-automation governance; an unanchored mechanical-override-refusal can conflict with the session-wide `hitl` policy. Solution: Anchor the stop-for-execution gate to the `hitl` skill (e.g. note it is a phase-specific specialization of the `hitl` stop-and-wait protocol), consistent with `qa-flow-test-correction.md` and the parent `qa-flow.md` carve-out.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/testgen-flow.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Reference Integrity	Problem: The rewrite renamed the state-schema section to `<state_and_outputs>` and moved schema ownership to Phase 0's `<state_file_template>`. But the sibling phase file testgen-flow-data-collection.md still points to `testgen-flow.md` `<state_file>` (step 1.4), a section name that no longer exists in this file. The pointer target was renamed here without the consumer being updated, leaving a dangling cross-phase reference. Reason: An agent following the data-collection pointer cannot resolve `testgen-flow.md` `<state_file>` and may improvise a state schema, breaking the cross-phase contract. Solution: Keep a stable anchor: either re-add a `<state_file>` tag name (or alias) in this workflow that owns/forwards the state schema, or coordinate so data-collection points to Phase 0 `<state_file_template>` (the true SSoT named in `<state_and_outputs>`). Make the consumer and the SSoT name agree.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	✅ Much better
Decision Branching	4	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	✅ Much better
Failure Handling	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	4	✅ Much better
Rosetta	4	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow-data-collection.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Reference Integrity	Problem: Step 1.4 (`<update_state>`) instructs to update state 'per the parent flow's canonical state-file schema (declared once in `testgen-flow.md` `<state_file>`)'. The rewritten parent testgen-flow.md has no `<state_file>` section — its state section is `<state_and_outputs>`, which delegates ownership to Phase 0's `<state_file_template>` in testgen-flow-project-config-loading.md. The named anchor does not resolve. Reason: An agent grepping for `testgen-flow.md` `<state_file>` finds nothing and may invent a state schema, diverging from the Phase 0 template every other phase relies on. Solution: Point step 1.4 at the real SSoT: Phase 0 `<state_file_template>` in `testgen-flow-project-config-loading.md` (or the parent's `<state_and_outputs>` section name), matching what the parent actually declares.
⚪ Low	Example Grounding	Problem: Frontmatter description was shortened to 'Phase 1 of Test Generation - Data collection ' (trailing space, and dropped the 'from Jira and Confluence' specificity present in BASE). The body still relies on config-resolved vendors, so the description no longer signals the concrete sources the phase handles. Reason: Description is the call-to-action surface; losing the source hint slightly weakens phase identification, though body content remains complete. Solution: Restore a concise source hint in the description (e.g. 'Data collection from issue tracker + documentation sources') and trim the trailing space.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	2	⬇️ Slightly worse
Structural Coherence	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/testgen-flow-gap-and-contradiction-analysis.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Rosetta	Problem: Cross-family deep link into skill internals at line 41 (`requirements-use/references/gap-analysis-catalogs.md`) violates the pa-rosetta/pa-hardening rule 'no cross-skill deep linking to private content' and 'references must be wrapped in commands or ACQUIRE'd', not named as bare filesystem paths. Reason: Naming a skill's private reference path from a consuming phase is the boundary violation the Rosetta isolation rules exist to prevent. Solution: Invoke the skill by logical name and remove the bare internal path; if the catalog must be cited, wrap it as an ACQUIRE owned by the skill, not the phase.
🔵 Medium	Reference Integrity	Problem: Step 3 of <run_analysis> (line 41) names another skill's private file path `requirements-use/references/gap-analysis-catalogs.md` directly. The phase reaches into the skill's internal implementation instead of just invoking USE SKILL `requirements-use` and letting the skill own which catalog it loads. Reason: Coupling a phase to a skill's private reference filename breaks skill folder isolation and will silently break if the skill reorganizes its references. Solution: Drop the explicit `references/gap-analysis-catalogs.md` path; reference the gap_analysis mode by name only and let the skill resolve its own internal references.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/testgen-flow-project-config-loading.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/testgen-flow-question-generation.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Rosetta	Problem: The primary HITL gate (step 3.2 line 108 `PAUSE — WAIT FOR USER INPUT`, plus <workflow_context> line 20) is expressed as inline gate prose and never routes through `USE SKILL hitl`. The sibling Phase 0 (project-config-loading step 0.6) routes its gate through `USE SKILL hitl`. pa-rosetta/pa-hardening require HITL/approval to live in the canonical `hitl` home, so the most important HITL gate of the whole flow is inconsistent with its own sibling. Reason: An inline-only approval gate diverges from the canonical HITL skill and from the sibling phase, so approval-handling behavior is inconsistent at the single most safety-relevant gate in the flow. Solution: Gate the Phase 3 answer-wait/approval via `USE SKILL hitl` the same way Phase 0 step 0.6 does, keeping the inline prose only as the on-load-failure fallback.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/testgen-flow-requirements-document-generation.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Bloat Control	Problem: The <failure_handling> block adds a large non-operational meta-justification 'Conscious tradeoff — why no inline per-entry fallback (declared once, not re-derived per turn)' (lines 143-149) explaining WHY there is no fallback, plus rationale about the sibling test-case-generation phase. This is provenance/rationale prose, not an instruction the agent executes — exactly the non-operational meta-note pa-hardening says to remove. Reason: Non-operational rationale re-sent every turn inflates cost and cognitive load without changing agent behavior; pa-hardening flags exactly this class of meta-note. Solution: Keep only the operational rule ('skill is a hard dependency; on failure re-invoke once then block — no inline fallback'); delete the multi-bullet rationale and the sibling-comparison paragraph.
🔵 Medium	Rosetta	Problem: Cross-family deep linking into skill internals at lines 47 and 146 (`requirements-authoring/references/authoring-catalogs.md`, skill SKILL.md deploy path), plus the long non-operational tradeoff note (lines 143-149), both violate pa-rosetta/pa-hardening (no cross-skill deep linking; remove non-operational meta-notes / change-rationale). Reason: These are the two specific Rosetta authoring violations (skill-internal deep link + non-operational provenance) that the hardening reference calls out by name. Solution: Reference the skill by logical name only and strip the rationale paragraph, leaving the one-line operational block-on-failure rule.
🔵 Medium	Cognitive Budget	Problem: Step 4.3 restates the full phase-owned section contract, testgen-specific Executive Summary block, Traceability block, SMART exemplar, and coverage prompt (lines 45-118) while also delegating to the synthesis mode that owns the same schemas. The duplicated contract+rationale enlarges the per-turn surface area for a single document-build step. Reason: Carrying both the full contract and the rationale for the contract in one step pushes the phase toward the upper size band and competes for the agent's attention budget. Solution: Compress step 4.3 to the section table plus the two testgen-only deltas (Executive Summary, Traceability column); move worked SMART/coverage examples to a single short pointer to the skill catalog rather than inlining them.
🔵 Medium	Reference Integrity	Problem: Step 4.3 (line 47) and the failure_handling tradeoff (line 146) name another skill's private internals (`requirements-authoring/references/authoring-catalogs.md` and the skill's deploy path `instructions/<release>/core/skills/requirements-authoring/SKILL.md`) as bare paths. The phase reaches into skill-private files instead of invoking the skill by name and letting it own its references. Reason: Hard-coding a skill's internal reference filename and deploy path into a phase breaks skill isolation and will drift if the skill is reorganized. Solution: Reference `requirements-authoring` synthesis mode by logical name only; remove the `references/authoring-catalogs.md` and explicit deploy-path citations.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	3	⬇️ Slightly worse
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/testgen-flow-test-case-export.md`

⚠️ Issues Found

Severity	Gate	Details
🟠 Very High	Safety Boundaries	Problem: This is the destructive phase (writes to an external TMS). Line 36 ASSERTS the phase owns 'idempotency (the destructive-write confirmation gate + dedup pre-scan)', but no such gate or pre-scan exists anywhere in steps 6.1-6.6. Step 6.3 only asks for a target location; step 6.5 exports every case directly with no pre-write confirmation and no duplicate check. The duplicate risk is only passively acknowledged as a pitfall (line 143 'Re-running export may create duplicates in TMS — document this behavior'). On a rerun the phase will silently re-create every case. Reason: The phase claims a destructive-write safety control it never implements; an unguarded rerun duplicates the entire suite in the external system with no confirmation. Solution: Add an explicit step before step 6.5: dedup pre-scan of the target location for already-exported TC IDs/titles, then a destructive-write confirmation gate (route via `USE SKILL hitl`) that shows the user the create/skip plan and requires explicit confirmation before any TMS write. Make the line-36 claim point at that real step.
🟡 High	Precision & Explicitness	Problem: Line 36 uses the precise terms 'destructive-write confirmation gate' and 'dedup pre-scan' as if they are defined controls, but neither term is defined or operationalized later in the file, so the modal claim is non-actionable. The agent is told the phase OWNS these controls but is never told how to perform them. Reason: Naming a control the agent cannot locate or execute is an explicitness gap that makes the safety claim unenforceable. Solution: Either add the concrete steps these terms name, or remove the terms; do not assert ownership of a control that has no procedure.
🟡 High	Workflow Completeness	Problem: The parent workflow marks Phase 6 `type="HITL"` requiring the user to 'confirm export', and <workflow_context> line 19 lists HITL as 'user must provide target location' — but the step sequence has no operational confirm-export gate. The 'confirm export' obligation from the parent and the destructive-write gate named at line 36 are both missing from the numbered steps 6.1-6.9. Reason: A multi-step destructive workflow that omits the parent-mandated confirmation step has an implicit (missing) step at exactly the irreversible action. Solution: Insert a numbered confirm-export step between get_target_location (6.3) and export (6.5) that pauses for explicit user approval of the export scope and target, mirroring the parent's HITL contract.
🟡 High	Rosetta	Problem: The parent workflow designates Phase 6 a HITL gate, yet this phase routes user interaction (target location ask, partial-export decision) through plain inline `Ask user` prompts and never `USE SKILL hitl`. pa-rosetta/pa-hardening require HITL approval to live in the `hitl` skill; sibling Phase 0 already does this, so the family is inconsistent for its two declared HITL phases. Reason: HITL handled outside the `hitl` skill bypasses the session-wide approval protocol and is inconsistent across the phase family, weakening the guarantee that the user gates the destructive export. Solution: Route the export confirmation and the partial-export user decision through `USE SKILL hitl` (canonical approval/escalation home), matching Phase 0 step 0.6; keep the inline prompts only as the skill-load-failure fallback.
🔵 Medium	Bloat Control	Problem: Line 36 is a non-operational ownership/meta declaration ('This phase OWNS the export contract … idempotency …; the skill EMITS … it never decides the contract') that describes responsibility boundaries rather than instructing an action, while the action it claims (gate + dedup) is absent. It is meta-prose, not a step. Reason: An ownership claim that substitutes for the missing procedure adds words and a false sense of coverage without changing behavior. Solution: Replace the ownership paragraph with the actual operational steps (dedup scan + confirm gate); keep at most a one-line note of which artifact is the export source.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	2	⬇️ Slightly worse
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow-test-case-generation.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Precision & Explicitness	Problem: The PR's stated direction is de-hardcoding the vendor to a config-resolved TMS binding (step 5.3 line 84), yet residual hardcoded 'TestRail' remains in operative text: phase_steps line 25 'Generate test cases in TestRail format' and the user-facing message line 281 'Ready to proceed to Phase 6 (TestRail Export)?'. The same concept (the resolved TMS vendor) is named two ways, violating one-term-per-concept. Reason: A step that says 'TestRail format' and a user message that says 'TestRail Export' contradict the config-resolved-vendor model the same file establishes, and can mislead the agent on non-TestRail projects. Solution: Change line 25 to 'Generate test cases in the resolved TMS FORMAT' and line 281 to 'Phase 6 (Test Case Export)'; keep `testrail` only where it is an explicit example of a resolvable binding.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

github-actions · 2026-06-11T10:00:03Z

📋 Prompt Quality Validation Report

❌ Validation Failed

Summary by File

File	🟠 Very High	🟡 High	🔵 Medium	⚪ Low	Status
`instructions/r2/core/skills/coding/SKILL.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/skills/debugging/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/reverse-engineering/SKILL.md`	0	0	0	1	⚠️ Warning
`instructions/r2/core/skills/testing/SKILL.md`	0	0	2	0	⚠️ Warning
`instructions/r2/core/skills/testing/references/implementation-examples.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/orchestrator-contract/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/orchestrator-contract/references/dispatch-template.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/requirements-authoring/SKILL.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/skills/requirements-authoring/assets/ra-requirement-unit.xml`	2	1	1	0	❌ Fail
`instructions/r2/core/skills/requirements-authoring/references/authoring-catalogs.md`	0	1	0	0	❌ Fail
`instructions/r2/core/skills/requirements-use/SKILL.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/skills/requirements-use/references/gap-analysis-catalogs.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/discovery/SKILL.md`	0	1	2	0	❌ Fail
`instructions/r2/core/skills/discovery/references/confluence-binding.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/discovery/references/jira-binding.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/discovery/references/testrail-binding.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/operation-manager/SKILL.md`	0	2	1	0	❌ Fail
`instructions/r2/core/skills/operation-manager/assets/om-schema.md`	0	0	2	0	⚠️ Warning
`instructions/r2/core/skills/scenarios-generation/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/scenarios-generation/references/gwt-spec.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/scenarios-generation/references/testrail-export.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/scenarios-generation/references/testrail-format.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow-data-collection.md`	0	0	0	1	⚠️ Warning
`instructions/r2/core/workflows/aqa-flow-requirements-clarification.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/aqa-flow-code-analysis.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/aqa-flow-selector-identification.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/aqa-flow-selector-implementation.md`	0	1	0	0	❌ Fail
`instructions/r2/core/workflows/aqa-flow-test-correction.md`	0	1	0	0	❌ Fail
`instructions/r2/core/workflows/aqa-flow-test-implementation.md`	0	1	0	0	❌ Fail
`instructions/r2/core/workflows/aqa-flow-test-report-analysis.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-data-collection.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-api-spec-analysis.md`	0	0	2	0	⚠️ Warning
`instructions/r2/core/workflows/qa-flow-documentation-mcp-subflow.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/qa-flow-execution-and-report-analysis.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-gap-and-requirements-clarification.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-project-config-loading.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-test-case-specification.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-test-correction.md`	0	2	1	0	❌ Fail
`instructions/r2/core/workflows/qa-flow-test-implementation.md`	0	1	0	0	❌ Fail
`instructions/r2/core/workflows/testgen-flow.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/testgen-flow-data-collection.md`	0	0	2	0	⚠️ Warning
`instructions/r2/core/workflows/testgen-flow-gap-and-contradiction-analysis.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/testgen-flow-project-config-loading.md`	0	0	0	1	⚠️ Warning
`instructions/r2/core/workflows/testgen-flow-question-generation.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/testgen-flow-requirements-document-generation.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/testgen-flow-test-case-export.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/testgen-flow-test-case-generation.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/coding/SKILL.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/skills/debugging/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/discovery/SKILL.md`	0	1	2	0	❌ Fail
`instructions/r3/core/skills/discovery/references/confluence-binding.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/discovery/references/jira-binding.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/discovery/references/testrail-binding.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/operation-manager/SKILL.md`	0	1	0	0	❌ Fail
`instructions/r3/core/skills/orchestrator-contract/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/orchestrator-contract/references/dispatch-template.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/requirements-authoring/SKILL.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/skills/requirements-authoring/references/authoring-catalogs.md`	0	1	0	0	❌ Fail
`instructions/r3/core/skills/requirements-use/SKILL.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/skills/requirements-use/references/gap-analysis-catalogs.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/reverse-engineering/SKILL.md`	0	0	0	1	⚠️ Warning
`instructions/r3/core/skills/scenarios-generation/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/scenarios-generation/references/gwt-spec.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/scenarios-generation/references/testrail-export.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/scenarios-generation/references/testrail-format.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/testing/SKILL.md`	0	0	2	0	⚠️ Warning
`instructions/r3/core/skills/testing/references/implementation-examples.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-code-analysis.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/aqa-flow-data-collection.md`	0	0	0	1	⚠️ Warning
`instructions/r3/core/workflows/aqa-flow-requirements-clarification.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/aqa-flow-selector-identification.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/aqa-flow-selector-implementation.md`	0	1	0	0	❌ Fail
`instructions/r3/core/workflows/aqa-flow-test-correction.md`	0	1	0	0	❌ Fail
`instructions/r3/core/workflows/aqa-flow-test-implementation.md`	0	1	0	0	❌ Fail
`instructions/r3/core/workflows/aqa-flow-test-report-analysis.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-api-spec-analysis.md`	0	0	2	0	⚠️ Warning
`instructions/r3/core/workflows/qa-flow-data-collection.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-documentation-mcp-subflow.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/qa-flow-execution-and-report-analysis.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-gap-and-requirements-clarification.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-project-config-loading.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-test-case-specification.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-test-correction.md`	0	2	1	0	❌ Fail
`instructions/r3/core/workflows/qa-flow-test-implementation.md`	0	1	0	0	❌ Fail
`instructions/r3/core/workflows/qa-flow.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-data-collection.md`	0	0	2	0	⚠️ Warning
`instructions/r3/core/workflows/testgen-flow-gap-and-contradiction-analysis.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/testgen-flow-project-config-loading.md`	0	0	0	1	⚠️ Warning
`instructions/r3/core/workflows/testgen-flow-question-generation.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-requirements-document-generation.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-test-case-export.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-test-case-generation.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/testgen-flow.md`	0	0	0	0	✅ Pass

📋 Full per-file findings (Problem / Reason / Solution + Gates Comparison) → Workflow run Summary (PR comments are capped at 65,536 chars; details live on the Actions run).

github-actions · 2026-06-11T10:02:23Z

📋 Prompt Quality Validation Report

❌ Validation Failed

Summary by File

File	🟠 Very High	🟡 High	🔵 Medium	⚪ Low	Status
`instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/coding/SKILL.md`	0	0	2	0	⚠️ Warning
`instructions/r2/core/skills/debugging/SKILL.md`	0	0	2	0	⚠️ Warning
`instructions/r2/core/skills/reverse-engineering/SKILL.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/skills/requirements-use/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/requirements-use/references/gap-analysis-catalogs.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/testgen-flow-requirements-document-generation.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/testgen-flow-test-case-export.md`	0	1	0	0	❌ Fail
`instructions/r2/core/workflows/testgen-flow-test-case-generation.md`	0	1	0	0	❌ Fail
`instructions/r2/core/workflows/testgen-flow.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/coding/SKILL.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/skills/operation-manager/SKILL.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/skills/orchestrator-contract/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/orchestrator-contract/references/dispatch-template.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/discovery/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/discovery/references/confluence-binding.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/discovery/references/jira-binding.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/skills/discovery/references/testrail-binding.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/skills/operation-manager/SKILL.md`	1	1	0	0	❌ Fail
`instructions/r2/core/skills/operation-manager/assets/om-schema.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/orchestrator-contract/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/orchestrator-contract/references/dispatch-template.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/skills/requirements-authoring/SKILL.md`	0	1	0	0	❌ Fail
`instructions/r2/core/skills/requirements-authoring/assets/ra-requirement-unit.xml`	1	2	1	0	❌ Fail
`instructions/r2/core/skills/requirements-authoring/references/authoring-catalogs.md`	0	1	1	0	❌ Fail
`instructions/r2/core/skills/scenarios-generation/SKILL.md`	0	0	0	1	⚠️ Warning
`instructions/r2/core/skills/scenarios-generation/references/gwt-spec.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/scenarios-generation/references/testrail-export.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/scenarios-generation/references/testrail-format.md`	0	0	0	0	✅ Pass
`instructions/r2/core/skills/testing/SKILL.md`	0	0	3	3	⚠️ Warning
`instructions/r2/core/skills/testing/references/implementation-examples.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow-code-analysis.md`	0	0	1	1	⚠️ Warning
`instructions/r2/core/workflows/aqa-flow-data-collection.md`	0	0	2	1	⚠️ Warning
`instructions/r2/core/workflows/aqa-flow-requirements-clarification.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/aqa-flow-selector-identification.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow-selector-implementation.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow-test-implementation.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow-test-report-analysis.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow-test-correction.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-api-spec-analysis.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/qa-flow-data-collection.md`	0	1	0	0	❌ Fail
`instructions/r2/core/workflows/qa-flow-documentation-mcp-subflow.md`	0	0	2	0	⚠️ Warning
`instructions/r2/core/workflows/qa-flow-execution-and-report-analysis.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-gap-and-requirements-clarification.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-project-config-loading.md`	0	1	1	0	❌ Fail
`instructions/r2/core/workflows/qa-flow-test-case-specification.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-test-correction.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-test-implementation.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/testgen-flow-data-collection.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/testgen-flow-gap-and-contradiction-analysis.md`	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/testgen-flow-project-config-loading.md`	0	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/testgen-flow-question-generation.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/skills/debugging/SKILL.md`	0	0	2	0	⚠️ Warning
`instructions/r3/core/skills/discovery/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/discovery/references/confluence-binding.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/discovery/references/jira-binding.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/skills/discovery/references/testrail-binding.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/skills/requirements-authoring/SKILL.md`	0	1	0	0	❌ Fail
`instructions/r3/core/skills/requirements-authoring/references/authoring-catalogs.md`	0	1	1	0	❌ Fail
`instructions/r3/core/skills/requirements-use/SKILL.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/requirements-use/references/gap-analysis-catalogs.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/reverse-engineering/SKILL.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/skills/scenarios-generation/SKILL.md`	0	0	0	1	⚠️ Warning
`instructions/r3/core/skills/scenarios-generation/references/gwt-spec.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/scenarios-generation/references/testrail-export.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/scenarios-generation/references/testrail-format.md`	0	0	0	0	✅ Pass
`instructions/r3/core/skills/testing/SKILL.md`	0	0	3	3	⚠️ Warning
`instructions/r3/core/skills/testing/references/implementation-examples.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-code-analysis.md`	0	0	1	1	⚠️ Warning
`instructions/r3/core/workflows/aqa-flow-data-collection.md`	0	0	2	1	⚠️ Warning
`instructions/r3/core/workflows/aqa-flow-requirements-clarification.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/aqa-flow-selector-identification.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-selector-implementation.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-test-correction.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-test-implementation.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-test-report-analysis.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-api-spec-analysis.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/qa-flow-data-collection.md`	0	1	0	0	❌ Fail
`instructions/r3/core/workflows/qa-flow-documentation-mcp-subflow.md`	0	0	2	0	⚠️ Warning
`instructions/r3/core/workflows/qa-flow-execution-and-report-analysis.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-gap-and-requirements-clarification.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-project-config-loading.md`	0	1	1	0	❌ Fail
`instructions/r3/core/workflows/qa-flow-test-case-specification.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-test-correction.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-test-implementation.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/testgen-flow-data-collection.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/testgen-flow-gap-and-contradiction-analysis.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-project-config-loading.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/testgen-flow-question-generation.md`	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/testgen-flow-requirements-document-generation.md`	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-test-case-export.md`	0	1	0	0	❌ Fail
`instructions/r3/core/workflows/testgen-flow-test-case-generation.md`	0	1	0	0	❌ Fail
`instructions/r3/core/workflows/testgen-flow.md`	0	0	0	0	✅ Pass

📄 `instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/coding/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Single Responsibility	Problem: The added `<implementation_modes>` block layers two distinct sub-behaviors (standards-first reading discipline and an approved-apply HITL fix-application loop) onto the general coding skill. approved-apply is described as 'a domain-specific specialization of `hitl`' and embeds an approval-gate state machine (steps 1-6 with GATEs) that is closer to a workflow-phase responsibility than to the coding skill's single implementation responsibility. Reason: Adding a HITL fix-application loop widens the skill from 'implement code' toward 'coordinate an approval workflow', diluting single responsibility and increasing cognitive surface. Solution: Keep standards-first as a coding concern but consider relocating the approved-apply approval/gate orchestration to the owning workflow phase, leaving the skill to EMIT the proposed-change content only.
🔵 Medium	Rosetta	Problem: New `<implementation_modes>` approved-apply step says `USE SKILL` debugging; debugging's new triage block reciprocally says `USE SKILL `coding. This is a peer-domain skill pointer (mild reciprocal coupling), not a cross-cutting MUST-skill like sensitive-data/hitl. It does not break execution (any skill is loadable) but slightly bends the skills-avoid-peer-skill convention. Reason: Peer-skill pointers add mild coupling but do not stop the agent; low impact. Solution: Keep as a one-way pointer or move the cross-reference to frontmatter/keywords; avoid the reciprocal coding<->debugging pairing so neither skill body steers into the other.

📊 Gates Comparison

Gate	Score	Comparison
Single Responsibility	3	⬇️ Slightly worse
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/debugging/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Single Responsibility	Problem: The added read-only `<test_execution_triage>` mode (parse report → categorize → page-source/HTTP analysis → cross-failure patterns → emit artifact) is a UI/API automated-test report-triage responsibility distinct from the skill's core 'find root cause before fixing' job, broadening the skill into AQA report analysis. Reason: Layering a report-triage mode onto debugging adds a second responsibility and audience (AQA execution reports), raising cognitive surface beyond the single debugging responsibility. Solution: Acceptable if intentional, but consider whether triage belongs in a dedicated AQA analysis skill/phase; at minimum keep the mode strictly scoped so the core debugging method stays primary.
🔵 Medium	Rosetta	Problem: New `<test_execution_triage>` block ends with 'GATE: read-only. Proposing or applying fixes is a separate correction phase — USE SKILL `coding`.' This makes the debugging skill actively invoke the coding skill. Combined with coding/SKILL.md's new `USE SKILL debugging`, the two peer domain skills now call each other — a forbidden skill-to-skill (and circular) dependency. Cross-cutting `USE SKILL sensitive-data` (line 26) is the accepted convention and is fine; the `coding` call is not. Reason: Peer-skill pointer adds mild coupling; does not stop execution (any skill is loadable). Low impact. Solution: Change the GATE to a non-imperative boundary statement (e.g. 'fixes are a separate correction phase owned by the calling workflow') rather than `USE SKILL coding`, leaving skill chaining to the workflow.

📊 Gates Comparison

Gate	Score	Comparison
Single Responsibility	3	⬇️ Slightly worse
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/reverse-engineering/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Single Responsibility	Problem: New `<analysis_modes>` adds two concrete modes (test-automation architecture analysis and API-contract extraction) onto the general code→spec reverse-engineering skill. The API-contract-extraction mode (locate Swagger/OpenAPI/route defs, emit per-endpoint parameters/schemas/auth/citations) is a fairly distinct AQA/API-discovery responsibility layered onto a skill whose core is 'recover intent / WHAT and WHY from code'. Reason: Two added concrete modes widen the skill's responsibility and audience beyond spec recovery, increasing cognitive surface even though each mode references the general method. Solution: Acceptable if these modes are intended specializations, but keep them clearly subordinate to the general method; if they grow, factor API-contract extraction into its own analysis skill/phase.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	3	⬇️ Slightly worse
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/requirements-use/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/requirements-use/references/gap-analysis-catalogs.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/testgen-flow-requirements-document-generation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/workflows/testgen-flow-test-case-export.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Reference Integrity	Problem: Step 6.1 resolves the vendor binding from `agents/testgen/{TICKET-KEY}/testgen-project-config.md` (per-ticket path), but Phase 0 (`testgen-flow-project-config-loading.md` step 0.3) saves the config to `agents/testgen/testgen-project-config.md` (project-wide, explicitly 'not per-ticket'). The path the export phase reads from will not exist. Reason: Per-run mismatch: the phase reads the vendor config from a per-ticket path while Phase 0 saves it project-wide, so the MCP export/format path is silently abandoned and the agent degrades to manual/inline every run. Solution: Change the config path in step 6.1 to the project-wide `agents/testgen/testgen-project-config.md` to match the Phase 0 canonical location.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	3	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	✅ Much better
Cognitive Budget	4	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/workflows/testgen-flow-test-case-generation.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Reference Integrity	Problem: Step 5.3 resolves the FORMAT vendor binding from `agents/testgen/{TICKET-KEY}/testgen-project-config.md` (per-ticket), but Phase 0 writes the config to the project-wide `agents/testgen/testgen-project-config.md`. Same path mismatch as the export phase. Reason: Per-run mismatch: the phase reads the vendor config from a per-ticket path while Phase 0 saves it project-wide, so the MCP export/format path is silently abandoned and the agent degrades to manual/inline every run. Solution: Point step 5.3's config read at `agents/testgen/testgen-project-config.md` to align with the Phase 0 canonical path.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	3	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	✅ Much better
Cognitive Budget	4	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/workflows/testgen-flow.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/skills/coding/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Rosetta	Problem: `<implementation_modes>` approved-apply step uses `USE SKILL` debugging while debugging reciprocally points back with `USE SKILL `coding, forming a peer-domain skill pairing (not a cross-cutting MUST-skill). Reason: Mild coupling between peer skills; does not break execution but bends the convention. Solution: Make the reference one-way or relocate it to frontmatter/keywords; avoid the reciprocal coding<->debugging coupling.

📊 Gates Comparison

Gate	Score	Comparison
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/operation-manager/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Dependency Management	Problem: The expanded frontmatter line `model: claude-sonnet-4-6, gpt-5.5, gemini-3.1-pro` bakes specific vendor model identifiers directly into the skill. Rosetta is coding-agent-agnostic; hardcoding model names per-vendor is the kind of literal that config-key precedence is meant to avoid. Reason: Hardcoded vendor model names age quickly and conflict with the agent-agnostic principle, though impact is low since it is only an advisory frontmatter hint. Solution: Keep the model hint minimal or move model selection to config-driven guidance rather than enumerating three literal vendor model ids in frontmatter.

📊 Gates Comparison

Gate	Score	Comparison
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/orchestrator-contract/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/skills/orchestrator-contract/references/dispatch-template.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/discovery/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/skills/discovery/references/confluence-binding.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/skills/discovery/references/jira-binding.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Output Contract	Problem: Unlike the confluence-binding, this binding has no explicit `Output sections` section enumerating the ordered blocks (per-field entries, Gaps, Redaction) the binding emits into the phase artifact. The field map plus per-field branch imply the shape, but the deterministic ordered output contract is left to the base SKILL. Reason: A point-of-use binding that omits its own ordered output contract forces the agent to infer block order, slightly reducing determinism across vendors. Solution: Add a short `Output sections` block (matching the confluence-binding's pattern) listing the ordered emitted blocks and that every section is present with `None.` for empties.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/skills/discovery/references/testrail-binding.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Output Contract	Problem: Like the jira-binding, this file has no explicit `Output sections` block enumerating the ordered emitted blocks; the confluence-binding has one but jira and testrail do not, so the three sibling bindings are inconsistent in declaring their output ordering. Reason: Inconsistent output-contract declaration across the three bindings makes the per-vendor emitted shape rely on inference for two of three vendors. Solution: Add a short `Output sections` block listing the ordered blocks (case entry fields, Gaps, Redaction) and the present-with-`None.` rule, matching the confluence-binding.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/skills/operation-manager/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟠 Very High	Reference Integrity	Problem: Both `<core_concepts>` and `<resources>` instruct `ACQUIRE todo-tasks-fallback.md FROM KB` as the universal baseline fallback, but the file `todo-tasks-fallback.md` exists only under `instructions/r3/core/rules/`, not anywhere in `instructions/r2/`. In an r2-scoped agent the universal fallback path cannot be resolved. Reason: The fallback is presented as the agent-agnostic universal baseline; if the ACQUIRE cannot resolve in r2, agents without rosettify/MCP/Node have no working mechanism, breaking the skill's primary promise. Solution: Add `todo-tasks-fallback.md` under r2 rules (or point the ACQUIRE at the actual r2 rule path that provides the fallback). Do not reference an r3-only file from an r2 skill.
🟡 High	Rosetta	Problem: The skill name `operation-manager` is not in the canonical `docs/definitions/skills.md`, which lists `plan-manager` instead. pa-rosetta.md definitions policy requires using names from `docs/definitions/.md` and not auto-adding out-of-list items. Additionally r2 now ships both `operation-manager` (new) and the legacy `plan-manager`; the new `dispatch-template` binds `operation-manager` while `docs/definitions/skills.md` still lists only `plan-manager`, leaving a duplicate-skill/definitions mismatch. Reason:* An out-of-canon skill name means workflows/agents referencing the definition list by logical name cannot reliably resolve this skill, violating the Rosetta definitions policy. Solution: Either add `operation-manager` to `docs/definitions/skills.md` (and remove/retire `plan-manager` if this replaces it), or rename the skill to the canonical `plan-manager`.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	2	⬇️ Slightly worse
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/operation-manager/assets/om-schema.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/skills/orchestrator-contract/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	4	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/skills/orchestrator-contract/references/dispatch-template.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Success Criteria	Problem: The new template is a fill-in-the-blanks form with no 'done-when' check telling the orchestrator the template is correctly filled (e.g., no rule that every placeholder must be resolved before dispatch). Reason: Without a completeness check an orchestrator can dispatch a half-filled template, defeating the quality-gate the parent SKILL relies on. Solution: Add one line at the top stating the dispatch is valid only when no bracketed placeholder remains and Tasks/Scope/Output are non-empty.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/skills/requirements-authoring/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Reference Integrity	Problem: The section (line 138) still lists asset `ra-requirement-unit.md`, but the `<req>` unit template now lives in `assets/ra-requirement-unit.xml` (the file modified in this same PR) and in `references/authoring-catalogs.md`. The `.md` asset is the wrong extension and points at a non-existent file. Reason: An ACQUIRE on a wrong-extension asset path returns nothing, so the agent cannot load the canonical unit template when drafting. Solution: Change the asset reference in from `ra-requirement-unit.md` to `ra-requirement-unit.xml` (and likewise verify ra-intent-capture / ra-validation-rubric / ra-change-log extensions).

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/requirements-authoring/assets/ra-requirement-unit.xml`

⚠️ Issues Found

Severity	Gate	Details
🟠 Very High	Conflict Resolution	Problem: The diff collapsed the two-field implementation schema. BASE had `NotStarted
🟡 High	Rosetta	Problem: The asset's canonical `<req>` template now diverges from the same `<req>` template kept in `requirements-authoring/SKILL.md` and `references/authoring-catalogs.md`: the asset collapses the implementation status enum + `<implementationNotes>` into one bracketed field with new vocab (Todo/Modify), while the catalog keeps the five-value enum + notes. Two contradictory canonical templates exist inside one skill family. Reason: DRY/SSoT violation: an agent filling a requirement unit gets conflicting schemas and emits inconsistent units. Solution: Pick one canonical home for the `<req>` schema (the asset) and have SKILL.md/authoring-catalogs.md point to it, or revert the asset to the enum+notes shape so all three agree.
🟡 High	Precision & Explicitness	Problem: The new collapsed `<implementation>` value mixes a status enum and free-text notes in one element with inline brackets, and uses different status words (`Todo`, `Modify`) than the rest of the skill (`Planned`, `ToBeModified`, `ToBeRemoved`). One concept now has two term sets. Reason: Mixed vocabulary and combined fields make machine parsing and human authoring ambiguous, lowering requirement-unit reliability. Solution: Use one status vocabulary across the skill and keep status separate from notes (do not pack enum + prose into one element).
🔵 Medium	Output Contract	Problem: By dropping `<implementationNotes>` the asset loses the explicit per-status guidance (Implemented: files affected; ToBeModified: what was dropped) that BASE carried, replacing it with a terser inline hint that no longer enumerates the per-status expectation. Reason: Authors using the asset alone now get weaker guidance on what to record per implementation status. Solution: Reinstate the per-status notes guidance (kept verbatim in the catalog) so the asset and catalog convey the same field contract.

📊 Gates Comparison

Gate	Score	Comparison
Output Contract	3	⬇️ Slightly worse
Conflict Resolution	2	⬇️ Slightly worse
Precision & Explicitness	3	⬇️ Slightly worse
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/requirements-authoring/references/authoring-catalogs.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Conflict Resolution	Problem: The catalog's `<req>` template (lines 25-26) keeps the five-value `<implementation>` enum + `<implementationNotes>`, while the asset `ra-requirement-unit.xml` modified in this same PR collapsed those into a single `[Implemented
🔵 Medium	Reference Integrity	Problem: Line 3 asserts 'SMART / MUST-SHOULD-MAY / priority conventions are owned by SKILL.md — not restated here', but SKILL.md never mentions SMART (grep returns nothing). The pointer to an owner section that does not exist is a dangling cross-reference introduced by this new file. Reason: A reader who follows the pointer to find SMART guidance in SKILL.md finds nothing, undermining trust in the 'owned by SKILL.md' deferral pattern. Solution: Either drop the SMART claim from line 3 or add the SMART convention to SKILL.md so the ownership pointer resolves.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/scenarios-generation/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
⚪ Low	Single Responsibility	Problem: `<when_to_use_skill>` (line 17) names the sibling skill: 'Use to DESIGN scenarios/specs; `testing` IMPLEMENTS them.' This is lateral sibling awareness of another skill by name in the body. Reason: pa-hardening forbids cross-skill awareness except in frontmatter/keywords; naming a sibling couples the two skills. Solution: Drop the explicit `testing` name from the body; the design-vs-implement boundary is already conveyed by 'runnable test code is `testing`, not this skill' could be softened to 'implementation is a separate concern'.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/scenarios-generation/references/gwt-spec.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/skills/scenarios-generation/references/testrail-export.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/skills/scenarios-generation/references/testrail-format.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/skills/testing/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Bloat Control	Problem: The modified SKILL more than doubled (3709 → 7972 chars). `<implementation_modes>` restates phase-SSoT framing ('The calling workflow PHASE is the SSoT ...') that is then repeated in implementation-examples.md ('The calling workflow PHASE owns the artifact paths ...'), partial duplication across the resident skill and its reference. Reason: Resident-prompt growth and cross-file restatement add cognitive load against the progressive-disclosure goal the diff itself claims. Solution: Compress the repeated phase-SSoT sentence to a single canonical statement in `<implementation_modes>` and let the reference cite it rather than restating ownership of paths/taxonomy/contract.
🔵 Medium	Rosetta	Problem: Added `<implementation_modes>` general-method line 65 directs `USE SKILL \\`coding\ `standards-first mode` from inside a SKILL body — a skill invoking another skill, which crosses the 'Skills can't call skills' boundary. Reason: Skill-to-skill invocation is a boundary violation; reliable skill loading is the caller's (phase/subagent) job, not a sibling skill's. Solution: Phrase as a non-invoking reference to repo conventions (the `coding` skill already appears as a passive `<resources>` entry) instead of an imperative USE SKILL inside the procedure.
🔵 Medium	Single Responsibility	Problem: The diff adds a large `<implementation_modes>` block (lines 61-84) with three modes (UI / API / Selector) plus a `frontmatter description` that still only advertises 'thorough, isolated, idempotent tests with 80% coverage'. The skill now also owns page-object selector identification and TMS-id-bearing API spec implementation, widening it beyond the original unit/scenario testing job. Reason: The added modes expand scope but the call-to-action description was not updated, so selection by description may under-trigger the new capability. Solution: Extend the frontmatter description to signal the impl/selector modes (still <30 tokens), so discovery matches the broadened responsibility added by the diff.
⚪ Low	Instruction Ordering	Problem: The added `<implementation_modes>` sits between `<core_concepts>` and `<validation_checklist>`; its hard GATE/stop rules (API mode step 1, selector read-only) are embedded mid-procedure rather than surfaced as top-level hard constraints, slightly weakening the constraints-first ordering the base file had. Reason: Hard constraints buried inside step lists are more likely to be deprioritized by the agent. Solution: Leave structure but ensure the stop/GATE conditions are visually marked (already partly done with 'GATE:'); optionally hoist a one-line 'hard gates' pointer into `<core_concepts>`.
⚪ Low	Conflict Resolution	Problem: Priority order appears in two places with the same defaults but different phrasing: implementation-examples API rules say 'A spec's priority field overrides this default', while SKILL `<implementation_modes>` defers everything to the PHASE as SSoT. No explicit statement of which wins (phase taxonomy vs spec priority field) when they differ. Reason: Two priority sources without a stated tie-breaker can yield inconsistent ordering decisions across runs. Solution: Add one clause stating precedence (phase-supplied taxonomy/cap overrides the reference's default priority order) so the two are not read as competing.
⚪ Low	Cognitive Budget	Problem: API impl mode step 1 (line 74) bundles a 4-part GATE (approved-specs + recorded approval + API-contract artifact + discoverable patterns) plus the stop-rule into one dense line; combined with three modes the resident section pushes the ~5-step working-memory cap. Reason: Dense multi-clause GATE lines are easier for the agent to partially skip; minor reliability risk. Solution: No content change required, but if compressed (see Bloat issue) the GATE conditions read more clearly as an enumerated list.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/testing/references/implementation-examples.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/workflows/aqa-flow-code-analysis.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Reference Integrity	Problem: Input contract makes `project_description.md` (repo root) the primary framework/standards source, but the parent `aqa-flow.md` Phase 3 row passes only `CONTEXT.md`+`ARCHITECTURE.md`+`IMPLEMENTATION.md` and never mentions `project_description.md`. The Input GATE accepts either, so it resolves, but the primary input named in the phase is not the input the workflow advertises it will supply. Reason: Slight mismatch between phase input naming and parent dispatch could make a phase-only reader expect a file the orchestrator did not pass. Solution: Add one line noting `project_description.md` is an AQA-target convention (also used by `qa-flow`) and that the parent workflow's repo-doc trio satisfies the GATE alternative; align wording so the named primary matches what Phase 3 receives.
⚪ Low	Bloat Control	Problem: Two parenthetical SSoT meta-notes in `<workflow_context>` (`single SSoT — referenced by other sections` and `single SSoT — referenced by other sections as "the read-only scope"`) restate the same DRY-anchor idea twice within four lines. Reason: Minor redundancy; does not affect behavior but adds reading cost on a dense context block. Solution: Keep one SSoT annotation and drop the second restatement; the anchor names already make the reference obvious.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	4	✅ Much better

📄 `instructions/r2/core/workflows/aqa-flow-data-collection.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Cognitive Budget	Problem: `<workflow_context>` packs vendor-resolution key-precedence lists for two vendor families, in-scope signal rules, fallback rules, guardrails-rule semantics, zero-doc pointer, and ACQUIRE-success definition into one dense block before the numbered steps. This is a large search space for the first thing the phase agent reads. Reason: Front-loading all vendor-resolution detail raises cognitive load and risks the agent skimming the precedence rules it must apply later. Solution: Move the config-key precedence tables into the per-vendor steps (1.2 / 1.3) where they are used, leaving only the scope summary in `<workflow_context>`.
🔵 Medium	Rosetta	Problem: Frontmatter description still reads `Data Collection from TestRail and Confluence` (hardcoded vendors), while the rewritten body deliberately config-resolves the TMS/Documentation vendors and warns `vendors are NOT hardcoded`. The description contradicts the body's vendor-agnostic design. Reason: A coding agent selecting the phase by description sees hardcoded vendors that the body explicitly forbids, creating a portability/SSoT inconsistency. Solution: Change the description to vendor-neutral wording (e.g. `Data Collection from configured TMS and documentation sources`), keeping defaults out of the call-to-action.
⚪ Low	Conflict Resolution	Problem: `<zero_doc_protocol>` is physically nested inside `<gather_confluence step="1.3">` but is referenced by `<gather_testrail step="1.2">`, `<workflow_context>`, and `<confirm_inputs>`. Its scope reads as Confluence-local even though it is a phase-wide rule. Reason: A reader scanning step 1.2 may not realize the zero-doc rule it must apply is defined two steps later inside a sibling block. Solution: Hoist `<zero_doc_protocol>` to phase level (a sibling of the step blocks) so its cross-step authority is structurally clear.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	4	✅ Much better

📄 `instructions/r2/core/workflows/aqa-flow-requirements-clarification.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Rosetta	Problem: Frontmatter description is far over the <30-token density target: it runs to a full multi-clause sentence (`...Assertion Transcription (derives typed assertions via the requirements-use gap_analysis mode and writes them to the test plan as a mandatory list) — USER INTERACTION REQUIRED`), embedding mechanism detail that belongs in the body, not the call-to-action. Reason: Frontmatter must be small and dense for selection; the embedded mechanism inflates token cost without aiding phase selection. Solution: Compress the description to a dense call-to-action (e.g. `Phase 2 of AQA — clarify gaps with the user and transcribe typed assertions; USER INTERACTION REQUIRED`); drop the parenthetical mechanism.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow-selector-identification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/workflows/aqa-flow-selector-implementation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/workflows/aqa-flow.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/workflows/aqa-flow-test-implementation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/workflows/aqa-flow-test-report-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	✅ Much better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	⬆️ Slightly better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/workflows/aqa-flow-test-correction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/workflows/qa-flow-api-spec-analysis.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Bloat Control	Problem: At 14.4K chars the phase carries a full `<endpoint_contract_template>` block (lines 99-134) AND a complete `<redaction_contract>` catalog (lines 180-191) AND a full `<validation_checklist>` (lines 211-222) inline. The redaction catalog (5 numbered redaction-target classes plus a grep list) is point-of-use reference material that pa-hardening `<audit_survival_checks>` says belongs in `references/`, not inline in a phase. Reason: Inline catalogs inflate the per-phase cognitive search space and duplicate redaction logic that the `sensitive-data` skill already owns. Solution: Consider extracting the `<redaction_contract>` catalog and the worked endpoint example (lines 136-177) to a references/ file ACQUIRE'd at point of use, keeping only the GATE/process lines inline.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/qa-flow-data-collection.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Rosetta	Problem: Step 1.2b.2 instructs `ACQUIRE qa-flow-documentation-mcp-subflow.md FROM KB` and 1.2b.4 says `execute all numbered steps inside <execute_documentation_mcp>`. Both this file and the acquired file carry `baseSchema: docs/schemas/phase.md`. A phase directly acquiring and driving the numbered steps of another phase-schema file is in tension with the boundary rule 'Phases can't call phases' (briefing line 23, pa-hardening line 15). It is framed as a reusable 'subflow' fragment rather than a USE FLOW call, but the phase-schema on the child plus parent-driven step execution makes the boundary ambiguous. Reason: Phase-to-phase step execution risks the lateral-awareness boundary; a clearer schema or routing keeps the contract clean. Solution: Consider giving the subflow file a distinct non-phase schema (e.g. a reference/fragment schema) or routing the branch through the parent workflow so a phase is not executing another phase's numbered steps.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/workflows/qa-flow-documentation-mcp-subflow.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Input Contract	Problem: There is no explicit prep-completion + load-context dependency bullet in this fragment, unlike the sibling phase files which state prerequisites. pa-hardening `<audit_survival_checks>` requires the prep/load-context dependency in workflows and in any consumer of prep output; the fragment consumes `qa-project-config.md` and Phase 0 output (lines 16-17) yet only states the config dependency, not the prep/load-context completion gate. Reason: Without the stated prep dependency a directly-ACQUIRE'd fragment could run against unloaded context. Solution: Add a one-line prerequisite noting Rosetta prep + load-context completion (or an explicit pointer that the parent phase already guarantees it).
🔵 Medium	Rosetta	Problem: The file declares `baseSchema: docs/schemas/phase.md` (line 6) but is not a standalone phase: it is an ACQUIRE'd fragment driven step-by-step by `qa-flow-data-collection` step 1.2b, it is not listed as a phase in the parent `qa-flow.md` `<workflow_phases>` (which enumerates only phases 0-7), and its `<description_and_purpose>` says 'Parent phase: qa-flow-data-collection ACQUIREs this fragment'. A phase-schema file that is really a sub-fragment of another phase blurs the phase boundary and the schema-purity expectation. Reason: Tension with the phases-can't-call-phases boundary, but the fragment is ACQUIRE-driven with a full skip-path and deterministic 4-branch output contract, so the agent still reaches a defined terminal state. Lower behavioral impact than a true phase call. Solution: Consider using a fragment/reference schema rather than `phase.md`, or registering it as a real distinct phase, so its schema matches its actual role.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/workflows/qa-flow-execution-and-report-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/qa-flow-gap-and-requirements-clarification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/qa-flow-project-config-loading.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Cognitive Budget	Problem: File is 20,427 chars / ~277 lines, over the ~300-line/20K soft budget and the largest of the 5 phase files. The `<config_contract>` 12-row key table, the full Step-input user-prompt template, and the Project config template together carry heavy detail that the engineer must hold while also tracking the redaction rules in `<safety_boundaries>`. Reason: A single phase running near the size ceiling raises compaction risk and the chance the agent drops a config key or a redaction step under load. Solution: Move the verbatim `## Step-input user-prompt template` and `## Project config template` blocks into a referenced point-of-use file (e.g. a references/ asset ACQUIRE'd at step 0.1) so the phase inline keeps only the contract table and decision lines.
🔵 Medium	Bloat Control	Problem: The required-key information is stated three times: once in the `<config_contract>` table, once in the `<validation_checklist>` ('Every required key from <config_contract> is present'), and again field-by-field inside the `## Project config template` markdown. The N/A-reason convention is also restated in the table cells, the Empty-field rule, and the template placeholders. Reason: Triplicated key schema is harder to keep in sync; if one copy is edited later the others silently drift. Solution: Keep the `<config_contract>` table as the single key authority and replace the per-field placeholder repetition in the Project config template with a pointer ('fields + N/A rules per <config_contract>').

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/qa-flow-test-case-specification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/workflows/qa-flow-test-correction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/workflows/qa-flow-test-implementation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/workflows/qa-flow.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Structural Coherence	Problem: `<skip_rules>` declares the always-in-force carve-out as 'Per-phase HITL gates (Phases 3-7 marked type="HITL")', but Phase 0 is declared `type="HITL-CONDITIONAL"` and carries a real HITL gate ('ASK USER FOR PROJECT INFO if config does not already exist'). The carve-out enumeration 3-7 omits the Phase 0 conditional gate. Reason: The verification-failure unilateral-start override lets the agent skip Phases 0-2; the carve-out list that protects HITL gates should unambiguously include Phase 0's conditional ask so config collection is never silently bypassed. Solution: Adjust the carve-out wording to cover the Phase 0 HITL-CONDITIONAL gate (e.g. 'Phases 0,3-7 carrying type=HITL / HITL-CONDITIONAL gates'), or state that the conditional gate is equally non-suppressible.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/testgen-flow-data-collection.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Bloat Control	Problem: The new `<pitfalls>` and `<common_issues>` blocks overlap heavily: 'Confluence search may miss child pages — always perform child-page traversal' (pitfalls) and 'Confluence search finds parent but misses child pages → Always perform the child-page traversal' (common_issues) restate the same guidance, and both partly duplicate the `confluence` binding the phase delegates to. Reason: Redundant lines add cognitive load and risk drift between the phase and the binding that now owns the behavior. Solution: Remove the duplicated child-page / truncation / URL-format lines from one of the two blocks since the `discovery` confluence-binding already owns that discipline.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/workflows/testgen-flow-gap-and-contradiction-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/workflows/testgen-flow-project-config-loading.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Output Contract	Problem: The new `<state_file_template>` `## Phase Details` example only shows a `### Phase 1` block (with `[Add sections for each completed phase]`), but the file is created in Phase 0 where Phase 0's own details row would be expected first; the template never shows a `### Phase 0` entry even though step 0.6 marks Phase 0 complete. Reason: A reader following the template may omit the Phase 0 detail block, leaving the state file's first completed phase undocumented. Solution: Add a brief `### Phase 0` example row to the `## Phase Details` block in `<state_file_template>`, or note that Phase 0 details are recorded via the completion-status checkbox only.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/workflows/testgen-flow-question-generation.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Rosetta	Problem: Step 3.4 `<create_answers_document>` HITL gate ends at step 3.5's `Ask: "Ready to proceed to Phase 4..."` but, unlike the sibling Phase 0/Phase 1/Phase 2 files in this same PR, the advance to Phase 4 is not wrapped in an explicit STOP-and-wait / `hitl` skill marker at step 3.5; the mandatory wait is only stated in `<workflow_context>` HITL GATE for the answer step, not for the Phase 4 advance ask. Reason: Consistency with the other phase gates in this PR; without it the final ask could be treated as informational and Phase 4 auto-started on silence. Solution: Add an explicit stop/wait clause (or `USE SKILL hitl`) to step 3.5 step 4 mirroring the Phase 0 step 0.6 and Phase 1 step 1.4 gates, so the proceed-to-Phase-4 ask is mechanically enforced.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/debugging/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Single Responsibility	Problem: The added read-only `<test_execution_triage>` mode (parse report → categorize → page-source/HTTP analysis → cross-failure patterns → emit artifact) is a UI/API automated-test report-triage responsibility distinct from the skill's core 'find root cause before fixing' job, broadening the skill into AQA report analysis. Reason: Layering a report-triage mode onto debugging adds a second responsibility and audience (AQA execution reports), raising cognitive surface beyond the single debugging responsibility. Solution: Acceptable if intentional, but consider whether triage belongs in a dedicated AQA analysis skill/phase; at minimum keep the mode strictly scoped so the core debugging method stays primary.
🔵 Medium	Rosetta	Problem: New `<test_execution_triage>` block ends with 'GATE: read-only. Proposing or applying fixes is a separate correction phase — USE SKILL `coding`.' This makes the debugging skill actively invoke the coding skill. Combined with coding/SKILL.md's new `USE SKILL debugging`, the two peer domain skills now call each other — a forbidden skill-to-skill (and circular) dependency. Cross-cutting `USE SKILL sensitive-data` (line 26) is the accepted convention and is fine; the `coding` call is not. Reason: Peer-skill pointer adds mild coupling; does not stop execution (any skill is loadable). Low impact. Solution: Change the GATE to a non-imperative boundary statement (e.g. 'fixes are a separate correction phase owned by the calling workflow') rather than `USE SKILL coding`, leaving skill chaining to the workflow.

📊 Gates Comparison

Gate	Score	Comparison
Single Responsibility	3	⬇️ Slightly worse
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/discovery/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/skills/discovery/references/confluence-binding.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/skills/discovery/references/jira-binding.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Output Contract	Problem: Unlike the confluence-binding, this binding has no explicit `Output sections` section enumerating the ordered blocks (per-field entries, Gaps, Redaction) the binding emits into the phase artifact. The field map plus per-field branch imply the shape, but the deterministic ordered output contract is left to the base SKILL. Reason: A point-of-use binding that omits its own ordered output contract forces the agent to infer block order, slightly reducing determinism across vendors. Solution: Add a short `Output sections` block (matching the confluence-binding's pattern) listing the ordered emitted blocks and that every section is present with `None.` for empties.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/skills/discovery/references/testrail-binding.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Output Contract	Problem: Like the jira-binding, this file has no explicit `Output sections` block enumerating the ordered emitted blocks; the confluence-binding has one but jira and testrail do not, so the three sibling bindings are inconsistent in declaring their output ordering. Reason: Inconsistent output-contract declaration across the three bindings makes the per-vendor emitted shape rely on inference for two of three vendors. Solution: Add a short `Output sections` block listing the ordered blocks (case entry fields, Gaps, Redaction) and the present-with-`None.` rule, matching the confluence-binding.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/skills/requirements-authoring/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Reference Integrity	Problem: The section (line 138) still lists asset `ra-requirement-unit.md`, but the `<req>` unit template now lives in `assets/ra-requirement-unit.xml` (the file modified in this same PR) and in `references/authoring-catalogs.md`. The `.md` asset is the wrong extension and points at a non-existent file. Reason: An ACQUIRE on a wrong-extension asset path returns nothing, so the agent cannot load the canonical unit template when drafting. Solution: Change the asset reference in from `ra-requirement-unit.md` to `ra-requirement-unit.xml` (and likewise verify ra-intent-capture / ra-validation-rubric / ra-change-log extensions).

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/requirements-authoring/references/authoring-catalogs.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Conflict Resolution	Problem: The catalog's `<req>` template (lines 25-26) keeps the five-value `<implementation>` enum + `<implementationNotes>`, while the asset `ra-requirement-unit.xml` modified in this same PR collapsed those into a single `[Implemented
🔵 Medium	Reference Integrity	Problem: Line 3 asserts 'SMART / MUST-SHOULD-MAY / priority conventions are owned by SKILL.md — not restated here', but SKILL.md never mentions SMART (grep returns nothing). The pointer to an owner section that does not exist is a dangling cross-reference introduced by this new file. Reason: A reader who follows the pointer to find SMART guidance in SKILL.md finds nothing, undermining trust in the 'owned by SKILL.md' deferral pattern. Solution: Either drop the SMART claim from line 3 or add the SMART convention to SKILL.md so the ownership pointer resolves.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/requirements-use/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/requirements-use/references/gap-analysis-catalogs.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r3/core/skills/reverse-engineering/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Single Responsibility	Problem: New `<analysis_modes>` adds two concrete modes (test-automation architecture analysis and API-contract extraction) onto the general code→spec reverse-engineering skill. The API-contract-extraction mode (locate Swagger/OpenAPI/route defs, emit per-endpoint parameters/schemas/auth/citations) is a fairly distinct AQA/API-discovery responsibility layered onto a skill whose core is 'recover intent / WHAT and WHY from code'. Reason: Two added concrete modes widen the skill's responsibility and audience beyond spec recovery, increasing cognitive surface even though each mode references the general method. Solution: Acceptable if these modes are intended specializations, but keep them clearly subordinate to the general method; if they grow, factor API-contract extraction into its own analysis skill/phase.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	3	⬇️ Slightly worse
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/scenarios-generation/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
⚪ Low	Single Responsibility	Problem: `<when_to_use_skill>` (line 17) names the sibling skill: 'Use to DESIGN scenarios/specs; `testing` IMPLEMENTS them.' This is lateral sibling awareness of another skill by name in the body. Reason: pa-hardening forbids cross-skill awareness except in frontmatter/keywords; naming a sibling couples the two skills. Solution: Drop the explicit `testing` name from the body; the design-vs-implement boundary is already conveyed by 'runnable test code is `testing`, not this skill' could be softened to 'implementation is a separate concern'.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/scenarios-generation/references/gwt-spec.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/skills/scenarios-generation/references/testrail-export.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/skills/scenarios-generation/references/testrail-format.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/skills/testing/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Bloat Control	Problem: The modified SKILL more than doubled (3709 → 7972 chars). `<implementation_modes>` restates phase-SSoT framing ('The calling workflow PHASE is the SSoT ...') that is then repeated in implementation-examples.md ('The calling workflow PHASE owns the artifact paths ...'), partial duplication across the resident skill and its reference. Reason: Resident-prompt growth and cross-file restatement add cognitive load against the progressive-disclosure goal the diff itself claims. Solution: Compress the repeated phase-SSoT sentence to a single canonical statement in `<implementation_modes>` and let the reference cite it rather than restating ownership of paths/taxonomy/contract.
🔵 Medium	Rosetta	Problem: Added `<implementation_modes>` general-method line 65 directs `USE SKILL \\`coding\ `standards-first mode` from inside a SKILL body — a skill invoking another skill, which crosses the 'Skills can't call skills' boundary. Reason: Skill-to-skill invocation is a boundary violation; reliable skill loading is the caller's (phase/subagent) job, not a sibling skill's. Solution: Phrase as a non-invoking reference to repo conventions (the `coding` skill already appears as a passive `<resources>` entry) instead of an imperative USE SKILL inside the procedure.
🔵 Medium	Single Responsibility	Problem: The diff adds a large `<implementation_modes>` block (lines 61-84) with three modes (UI / API / Selector) plus a `frontmatter description` that still only advertises 'thorough, isolated, idempotent tests with 80% coverage'. The skill now also owns page-object selector identification and TMS-id-bearing API spec implementation, widening it beyond the original unit/scenario testing job. Reason: The added modes expand scope but the call-to-action description was not updated, so selection by description may under-trigger the new capability. Solution: Extend the frontmatter description to signal the impl/selector modes (still <30 tokens), so discovery matches the broadened responsibility added by the diff.
⚪ Low	Instruction Ordering	Problem: The added `<implementation_modes>` sits between `<core_concepts>` and `<validation_checklist>`; its hard GATE/stop rules (API mode step 1, selector read-only) are embedded mid-procedure rather than surfaced as top-level hard constraints, slightly weakening the constraints-first ordering the base file had. Reason: Hard constraints buried inside step lists are more likely to be deprioritized by the agent. Solution: Leave structure but ensure the stop/GATE conditions are visually marked (already partly done with 'GATE:'); optionally hoist a one-line 'hard gates' pointer into `<core_concepts>`.
⚪ Low	Conflict Resolution	Problem: Priority order appears in two places with the same defaults but different phrasing: implementation-examples API rules say 'A spec's priority field overrides this default', while SKILL `<implementation_modes>` defers everything to the PHASE as SSoT. No explicit statement of which wins (phase taxonomy vs spec priority field) when they differ. Reason: Two priority sources without a stated tie-breaker can yield inconsistent ordering decisions across runs. Solution: Add one clause stating precedence (phase-supplied taxonomy/cap overrides the reference's default priority order) so the two are not read as competing.
⚪ Low	Cognitive Budget	Problem: API impl mode step 1 (line 74) bundles a 4-part GATE (approved-specs + recorded approval + API-contract artifact + discoverable patterns) plus the stop-rule into one dense line; combined with three modes the resident section pushes the ~5-step working-memory cap. Reason: Dense multi-clause GATE lines are easier for the agent to partially skip; minor reliability risk. Solution: No content change required, but if compressed (see Bloat issue) the GATE conditions read more clearly as an enumerated list.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better

📄 `instructions/r3/core/skills/testing/references/implementation-examples.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/workflows/aqa-flow-code-analysis.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Reference Integrity	Problem: Input contract makes `project_description.md` (repo root) the primary framework/standards source, but the parent `aqa-flow.md` Phase 3 row passes only `CONTEXT.md`+`ARCHITECTURE.md`+`IMPLEMENTATION.md` and never mentions `project_description.md`. The Input GATE accepts either, so it resolves, but the primary input named in the phase is not the input the workflow advertises it will supply. Reason: Slight mismatch between phase input naming and parent dispatch could make a phase-only reader expect a file the orchestrator did not pass. Solution: Add one line noting `project_description.md` is an AQA-target convention (also used by `qa-flow`) and that the parent workflow's repo-doc trio satisfies the GATE alternative; align wording so the named primary matches what Phase 3 receives.
⚪ Low	Bloat Control	Problem: Two parenthetical SSoT meta-notes in `<workflow_context>` (`single SSoT — referenced by other sections` and `single SSoT — referenced by other sections as "the read-only scope"`) restate the same DRY-anchor idea twice within four lines. Reason: Minor redundancy; does not affect behavior but adds reading cost on a dense context block. Solution: Keep one SSoT annotation and drop the second restatement; the anchor names already make the reference obvious.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	4	✅ Much better

📄 `instructions/r3/core/workflows/aqa-flow-data-collection.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Cognitive Budget	Problem: `<workflow_context>` packs vendor-resolution key-precedence lists for two vendor families, in-scope signal rules, fallback rules, guardrails-rule semantics, zero-doc pointer, and ACQUIRE-success definition into one dense block before the numbered steps. This is a large search space for the first thing the phase agent reads. Reason: Front-loading all vendor-resolution detail raises cognitive load and risks the agent skimming the precedence rules it must apply later. Solution: Move the config-key precedence tables into the per-vendor steps (1.2 / 1.3) where they are used, leaving only the scope summary in `<workflow_context>`.
🔵 Medium	Rosetta	Problem: Frontmatter description still reads `Data Collection from TestRail and Confluence` (hardcoded vendors), while the rewritten body deliberately config-resolves the TMS/Documentation vendors and warns `vendors are NOT hardcoded`. The description contradicts the body's vendor-agnostic design. Reason: A coding agent selecting the phase by description sees hardcoded vendors that the body explicitly forbids, creating a portability/SSoT inconsistency. Solution: Change the description to vendor-neutral wording (e.g. `Data Collection from configured TMS and documentation sources`), keeping defaults out of the call-to-action.
⚪ Low	Conflict Resolution	Problem: `<zero_doc_protocol>` is physically nested inside `<gather_confluence step="1.3">` but is referenced by `<gather_testrail step="1.2">`, `<workflow_context>`, and `<confirm_inputs>`. Its scope reads as Confluence-local even though it is a phase-wide rule. Reason: A reader scanning step 1.2 may not realize the zero-doc rule it must apply is defined two steps later inside a sibling block. Solution: Hoist `<zero_doc_protocol>` to phase level (a sibling of the step blocks) so its cross-step authority is structurally clear.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	4	✅ Much better

📄 `instructions/r3/core/workflows/aqa-flow-requirements-clarification.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Rosetta	Problem: Frontmatter description is far over the <30-token density target: it runs to a full multi-clause sentence (`...Assertion Transcription (derives typed assertions via the requirements-use gap_analysis mode and writes them to the test plan as a mandatory list) — USER INTERACTION REQUIRED`), embedding mechanism detail that belongs in the body, not the call-to-action. Reason: Frontmatter must be small and dense for selection; the embedded mechanism inflates token cost without aiding phase selection. Solution: Compress the description to a dense call-to-action (e.g. `Phase 2 of AQA — clarify gaps with the user and transcribe typed assertions; USER INTERACTION REQUIRED`); drop the parenthetical mechanism.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-selector-identification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/workflows/aqa-flow-selector-implementation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/workflows/aqa-flow-test-correction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/workflows/aqa-flow-test-implementation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/workflows/aqa-flow-test-report-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	✅ Much better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	⬆️ Slightly better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/workflows/aqa-flow.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/workflows/qa-flow-api-spec-analysis.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Bloat Control	Problem: At 14.4K chars the phase carries a full `<endpoint_contract_template>` block (lines 99-134) AND a complete `<redaction_contract>` catalog (lines 180-191) AND a full `<validation_checklist>` (lines 211-222) inline. The redaction catalog (5 numbered redaction-target classes plus a grep list) is point-of-use reference material that pa-hardening `<audit_survival_checks>` says belongs in `references/`, not inline in a phase. Reason: Inline catalogs inflate the per-phase cognitive search space and duplicate redaction logic that the `sensitive-data` skill already owns. Solution: Consider extracting the `<redaction_contract>` catalog and the worked endpoint example (lines 136-177) to a references/ file ACQUIRE'd at point of use, keeping only the GATE/process lines inline.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow-data-collection.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Rosetta	Problem: Step 1.2b.2 instructs `ACQUIRE qa-flow-documentation-mcp-subflow.md FROM KB` and 1.2b.4 says `execute all numbered steps inside <execute_documentation_mcp>`. Both this file and the acquired file carry `baseSchema: docs/schemas/phase.md`. A phase directly acquiring and driving the numbered steps of another phase-schema file is in tension with the boundary rule 'Phases can't call phases' (briefing line 23, pa-hardening line 15). It is framed as a reusable 'subflow' fragment rather than a USE FLOW call, but the phase-schema on the child plus parent-driven step execution makes the boundary ambiguous. Reason: Phase-to-phase step execution risks the lateral-awareness boundary; a clearer schema or routing keeps the contract clean. Solution: Consider giving the subflow file a distinct non-phase schema (e.g. a reference/fragment schema) or routing the branch through the parent workflow so a phase is not executing another phase's numbered steps.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	4	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/qa-flow-documentation-mcp-subflow.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Input Contract	Problem: There is no explicit prep-completion + load-context dependency bullet in this fragment, unlike the sibling phase files which state prerequisites. pa-hardening `<audit_survival_checks>` requires the prep/load-context dependency in workflows and in any consumer of prep output; the fragment consumes `qa-project-config.md` and Phase 0 output (lines 16-17) yet only states the config dependency, not the prep/load-context completion gate. Reason: Without the stated prep dependency a directly-ACQUIRE'd fragment could run against unloaded context. Solution: Add a one-line prerequisite noting Rosetta prep + load-context completion (or an explicit pointer that the parent phase already guarantees it).
🔵 Medium	Rosetta	Problem: The file declares `baseSchema: docs/schemas/phase.md` (line 6) but is not a standalone phase: it is an ACQUIRE'd fragment driven step-by-step by `qa-flow-data-collection` step 1.2b, it is not listed as a phase in the parent `qa-flow.md` `<workflow_phases>` (which enumerates only phases 0-7), and its `<description_and_purpose>` says 'Parent phase: qa-flow-data-collection ACQUIREs this fragment'. A phase-schema file that is really a sub-fragment of another phase blurs the phase boundary and the schema-purity expectation. Reason: Tension with the phases-can't-call-phases boundary, but the fragment is ACQUIRE-driven with a full skip-path and deterministic 4-branch output contract, so the agent still reaches a defined terminal state. Lower behavioral impact than a true phase call. Solution: Consider using a fragment/reference schema rather than `phase.md`, or registering it as a real distinct phase, so its schema matches its actual role.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better

📄 `instructions/r3/core/workflows/qa-flow-execution-and-report-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow-gap-and-requirements-clarification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow-project-config-loading.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Cognitive Budget	Problem: File is 20,427 chars / ~277 lines, over the ~300-line/20K soft budget and the largest of the 5 phase files. The `<config_contract>` 12-row key table, the full Step-input user-prompt template, and the Project config template together carry heavy detail that the engineer must hold while also tracking the redaction rules in `<safety_boundaries>`. Reason: A single phase running near the size ceiling raises compaction risk and the chance the agent drops a config key or a redaction step under load. Solution: Move the verbatim `## Step-input user-prompt template` and `## Project config template` blocks into a referenced point-of-use file (e.g. a references/ asset ACQUIRE'd at step 0.1) so the phase inline keeps only the contract table and decision lines.
🔵 Medium	Bloat Control	Problem: The required-key information is stated three times: once in the `<config_contract>` table, once in the `<validation_checklist>` ('Every required key from <config_contract> is present'), and again field-by-field inside the `## Project config template` markdown. The N/A-reason convention is also restated in the table cells, the Empty-field rule, and the template placeholders. Reason: Triplicated key schema is harder to keep in sync; if one copy is edited later the others silently drift. Solution: Keep the `<config_contract>` table as the single key authority and replace the per-field placeholder repetition in the Project config template with a pointer ('fields + N/A rules per <config_contract>').

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow-test-case-specification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/workflows/qa-flow-test-correction.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/workflows/qa-flow-test-implementation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/workflows/qa-flow.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Structural Coherence	Problem: `<skip_rules>` declares the always-in-force carve-out as 'Per-phase HITL gates (Phases 3-7 marked type="HITL")', but Phase 0 is declared `type="HITL-CONDITIONAL"` and carries a real HITL gate ('ASK USER FOR PROJECT INFO if config does not already exist'). The carve-out enumeration 3-7 omits the Phase 0 conditional gate. Reason: The verification-failure unilateral-start override lets the agent skip Phases 0-2; the carve-out list that protects HITL gates should unambiguously include Phase 0's conditional ask so config collection is never silently bypassed. Solution: Adjust the carve-out wording to cover the Phase 0 HITL-CONDITIONAL gate (e.g. 'Phases 0,3-7 carrying type=HITL / HITL-CONDITIONAL gates'), or state that the conditional gate is equally non-suppressible.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/testgen-flow-data-collection.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Bloat Control	Problem: The new `<pitfalls>` and `<common_issues>` blocks overlap heavily: 'Confluence search may miss child pages — always perform child-page traversal' (pitfalls) and 'Confluence search finds parent but misses child pages → Always perform the child-page traversal' (common_issues) restate the same guidance, and both partly duplicate the `confluence` binding the phase delegates to. Reason: Redundant lines add cognitive load and risk drift between the phase and the binding that now owns the behavior. Solution: Remove the duplicated child-page / truncation / URL-format lines from one of the two blocks since the `discovery` confluence-binding already owns that discipline.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow-gap-and-contradiction-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow-project-config-loading.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Output Contract	Problem: The new `<state_file_template>` `## Phase Details` example only shows a `### Phase 1` block (with `[Add sections for each completed phase]`), but the file is created in Phase 0 where Phase 0's own details row would be expected first; the template never shows a `### Phase 0` entry even though step 0.6 marks Phase 0 complete. Reason: A reader following the template may omit the Phase 0 detail block, leaving the state file's first completed phase undocumented. Solution: Add a brief `### Phase 0` example row to the `## Phase Details` block in `<state_file_template>`, or note that Phase 0 details are recorded via the completion-status checkbox only.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow-question-generation.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Rosetta	Problem: Step 3.4 `<create_answers_document>` HITL gate ends at step 3.5's `Ask: "Ready to proceed to Phase 4..."` but, unlike the sibling Phase 0/Phase 1/Phase 2 files in this same PR, the advance to Phase 4 is not wrapped in an explicit STOP-and-wait / `hitl` skill marker at step 3.5; the mandatory wait is only stated in `<workflow_context>` HITL GATE for the answer step, not for the Phase 4 advance ask. Reason: Consistency with the other phase gates in this PR; without it the final ask could be treated as informational and Phase 4 auto-started on silence. Solution: Add an explicit stop/wait clause (or `USE SKILL hitl`) to step 3.5 step 4 mirroring the Phase 0 step 0.6 and Phase 1 step 1.4 gates, so the proceed-to-Phase-4 ask is mechanically enforced.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/testgen-flow-requirements-document-generation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow-test-case-export.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Reference Integrity	Problem: Step 6.1 resolves the vendor binding from `agents/testgen/{TICKET-KEY}/testgen-project-config.md` (per-ticket path), but Phase 0 (`testgen-flow-project-config-loading.md` step 0.3) saves the config to `agents/testgen/testgen-project-config.md` (project-wide, explicitly 'not per-ticket'). The path the export phase reads from will not exist. Reason: Per-run mismatch: the phase reads the vendor config from a per-ticket path while Phase 0 saves it project-wide, so the MCP export/format path is silently abandoned and the agent degrades to manual/inline every run. Solution: Change the config path in step 6.1 to the project-wide `agents/testgen/testgen-project-config.md` to match the Phase 0 canonical location.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	3	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	✅ Much better
Cognitive Budget	4	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow-test-case-generation.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Reference Integrity	Problem: Step 5.3 resolves the FORMAT vendor binding from `agents/testgen/{TICKET-KEY}/testgen-project-config.md` (per-ticket), but Phase 0 writes the config to the project-wide `agents/testgen/testgen-project-config.md`. Same path mismatch as the export phase. Reason: Per-run mismatch: the phase reads the vendor config from a per-ticket path while Phase 0 saves it project-wide, so the MCP export/format path is silently abandoned and the agent degrades to manual/inline every run. Solution: Point step 5.3's config read at `agents/testgen/testgen-project-config.md` to align with the Phase 0 canonical path.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	3	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	✅ Much better
Cognitive Budget	4	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

github-actions · 2026-06-11T11:01:04Z

📋 Prompt Quality Validation Report

❌ Validation Failed

Summary by File

File	🟡 High	🔵 Medium	⚪ Low	Status
`instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md`	0	1	0	⚠️ Warning
`instructions/r3/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md`	0	1	0	⚠️ Warning
`instructions/r2/core/skills/coding/SKILL.md`	0	1	0	⚠️ Warning
`instructions/r3/core/skills/coding/SKILL.md`	0	1	0	⚠️ Warning
`instructions/r2/core/skills/debugging/SKILL.md`	1	1	0	❌ Fail
`instructions/r3/core/skills/debugging/SKILL.md`	1	1	0	❌ Fail
`instructions/r2/core/skills/testing/SKILL.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/testing/SKILL.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/testing/references/implementation-examples.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/testing/references/implementation-examples.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/reverse-engineering/SKILL.md`	0	3	2	⚠️ Warning
`instructions/r3/core/skills/reverse-engineering/SKILL.md`	0	3	2	⚠️ Warning
`instructions/r3/core/skills/operation-manager/SKILL.md`	0	1	0	⚠️ Warning
`instructions/r2/core/skills/discovery/SKILL.md`	0	2	0	⚠️ Warning
`instructions/r3/core/skills/discovery/SKILL.md`	0	2	0	⚠️ Warning
`instructions/r2/core/skills/discovery/references/confluence-binding.md`	0	1	1	⚠️ Warning
`instructions/r3/core/skills/discovery/references/confluence-binding.md`	0	1	1	⚠️ Warning
`instructions/r2/core/skills/discovery/references/jira-binding.md`	0	1	1	⚠️ Warning
`instructions/r3/core/skills/discovery/references/jira-binding.md`	0	1	1	⚠️ Warning
`instructions/r2/core/skills/discovery/references/testrail-binding.md`	0	1	1	⚠️ Warning
`instructions/r3/core/skills/discovery/references/testrail-binding.md`	0	1	1	⚠️ Warning
`instructions/r2/core/skills/orchestrator-contract/SKILL.md`	0	2	1	⚠️ Warning
`instructions/r3/core/skills/orchestrator-contract/SKILL.md`	0	2	1	⚠️ Warning
`instructions/r2/core/skills/orchestrator-contract/references/dispatch-template.md`	0	1	1	⚠️ Warning
`instructions/r3/core/skills/orchestrator-contract/references/dispatch-template.md`	0	1	1	⚠️ Warning
`instructions/r2/core/skills/requirements-authoring/SKILL.md`	0	0	1	⚠️ Warning
`instructions/r3/core/skills/requirements-authoring/SKILL.md`	0	0	1	⚠️ Warning
`instructions/r2/core/skills/requirements-authoring/references/authoring-catalogs.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/requirements-authoring/references/authoring-catalogs.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/requirements-authoring/assets/ra-requirement-unit.xml`	0	0	0	✅ Pass
`instructions/r2/core/skills/requirements-use/SKILL.md`	0	1	1	⚠️ Warning
`instructions/r3/core/skills/requirements-use/SKILL.md`	0	1	1	⚠️ Warning
`instructions/r2/core/skills/requirements-use/references/gap-analysis-catalogs.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/requirements-use/references/gap-analysis-catalogs.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/scenarios-generation/SKILL.md`	0	0	1	⚠️ Warning
`instructions/r3/core/skills/scenarios-generation/SKILL.md`	0	0	1	⚠️ Warning
`instructions/r2/core/skills/scenarios-generation/references/gwt-spec.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/scenarios-generation/references/gwt-spec.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/scenarios-generation/references/testrail-export.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/scenarios-generation/references/testrail-export.md`	0	0	0	✅ Pass
`instructions/r2/core/skills/scenarios-generation/references/testrail-format.md`	0	0	0	✅ Pass
`instructions/r3/core/skills/scenarios-generation/references/testrail-format.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow-code-analysis.md`	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/aqa-flow-code-analysis.md`	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/aqa-flow-data-collection.md`	0	0	1	⚠️ Warning
`instructions/r3/core/workflows/aqa-flow-data-collection.md`	0	0	1	⚠️ Warning
`instructions/r2/core/workflows/aqa-flow-requirements-clarification.md`	0	0	1	⚠️ Warning
`instructions/r3/core/workflows/aqa-flow-requirements-clarification.md`	0	0	1	⚠️ Warning
`instructions/r2/core/workflows/aqa-flow-selector-identification.md`	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/aqa-flow-selector-identification.md`	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/aqa-flow-selector-implementation.md`	1	2	0	❌ Fail
`instructions/r3/core/workflows/aqa-flow-selector-implementation.md`	1	2	0	❌ Fail
`instructions/r2/core/workflows/aqa-flow-test-correction.md`	1	1	0	❌ Fail
`instructions/r3/core/workflows/aqa-flow-test-correction.md`	1	1	0	❌ Fail
`instructions/r2/core/workflows/aqa-flow-test-implementation.md`	1	0	1	❌ Fail
`instructions/r3/core/workflows/aqa-flow-test-implementation.md`	1	0	1	❌ Fail
`instructions/r2/core/workflows/aqa-flow-test-report-analysis.md`	0	0	1	⚠️ Warning
`instructions/r3/core/workflows/aqa-flow-test-report-analysis.md`	0	0	1	⚠️ Warning
`instructions/r2/core/workflows/qa-flow.md`	0	1	1	⚠️ Warning
`instructions/r3/core/workflows/qa-flow.md`	0	1	1	⚠️ Warning
`instructions/r2/core/workflows/qa-flow-api-spec-analysis.md`	0	2	1	⚠️ Warning
`instructions/r3/core/workflows/qa-flow-api-spec-analysis.md`	0	2	1	⚠️ Warning
`instructions/r2/core/workflows/qa-flow-data-collection.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-data-collection.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-documentation-mcp-subflow.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-documentation-mcp-subflow.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-execution-and-report-analysis.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-execution-and-report-analysis.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-gap-and-requirements-clarification.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-gap-and-requirements-clarification.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-project-config-loading.md`	1	1	0	❌ Fail
`instructions/r3/core/workflows/qa-flow-project-config-loading.md`	1	1	0	❌ Fail
`instructions/r2/core/workflows/qa-flow-test-case-specification.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-test-case-specification.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-test-correction.md`	1	0	0	❌ Fail
`instructions/r3/core/workflows/qa-flow-test-correction.md`	1	0	0	❌ Fail
`instructions/r2/core/workflows/qa-flow-test-implementation.md`	0	2	0	⚠️ Warning
`instructions/r3/core/workflows/qa-flow-test-implementation.md`	0	2	0	⚠️ Warning
`instructions/r2/core/workflows/testgen-flow.md`	0	0	1	⚠️ Warning
`instructions/r3/core/workflows/testgen-flow.md`	0	0	1	⚠️ Warning
`instructions/r2/core/workflows/testgen-flow-data-collection.md`	0	1	1	⚠️ Warning
`instructions/r3/core/workflows/testgen-flow-data-collection.md`	0	1	1	⚠️ Warning
`instructions/r2/core/workflows/testgen-flow-gap-and-contradiction-analysis.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-gap-and-contradiction-analysis.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/testgen-flow-project-config-loading.md`	0	1	2	⚠️ Warning
`instructions/r3/core/workflows/testgen-flow-project-config-loading.md`	0	1	2	⚠️ Warning
`instructions/r2/core/workflows/testgen-flow-question-generation.md`	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-question-generation.md`	0	0	0	✅ Pass
`instructions/r2/core/workflows/testgen-flow-requirements-document-generation.md`	0	1	2	⚠️ Warning
`instructions/r3/core/workflows/testgen-flow-requirements-document-generation.md`	0	1	2	⚠️ Warning
`instructions/r2/core/workflows/testgen-flow-test-case-export.md`	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/testgen-flow-test-case-export.md`	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/testgen-flow-test-case-generation.md`	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/testgen-flow-test-case-generation.md`	0	1	0	⚠️ Warning

📋 Full per-file findings (Problem / Reason / Solution + Gates Comparison) → Workflow run Summary (PR comments are capped at 65,536 chars; details live on the Actions run).

github-actions · 2026-06-11T11:03:38Z

📋 Prompt Quality Validation Report

❌ Validation Failed

Summary by File

File	🔴 Critical	🟠 Very High	🟡 High	🔵 Medium	⚪ Low	Status
`instructions/r2/core/workflows/testgen-flow.md`	0	0	0	3	0	⚠️ Warning
`instructions/r2/core/workflows/testgen-flow-test-case-generation.md`	0	0	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/testgen-flow-requirements-document-generation.md`	0	0	1	2	1	❌ Fail
`instructions/r2/core/workflows/testgen-flow-test-case-export.md`	0	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/testgen-flow-question-generation.md`	0	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/testgen-flow-gap-and-contradiction-analysis.md`	0	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/testgen-flow-data-collection.md`	0	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/testgen-flow-project-config-loading.md`	0	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow.md`	0	0	0	2	0	⚠️ Warning
`instructions/r2/core/workflows/aqa-flow-test-implementation.md`	0	0	1	1	1	❌ Fail
`instructions/r2/core/workflows/aqa-flow-code-analysis.md`	0	0	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/aqa-flow-selector-identification.md`	0	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/aqa-flow-test-correction.md`	0	0	1	1	0	❌ Fail
`instructions/r2/core/workflows/aqa-flow-test-report-analysis.md`	0	0	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/aqa-flow-selector-implementation.md`	0	0	1	1	0	❌ Fail
`instructions/r2/core/workflows/aqa-flow-data-collection.md`	0	0	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/aqa-flow-requirements-clarification.md`	0	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow.md`	0	0	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/qa-flow-project-config-loading.md`	-	-	-	-	-	❌ Error
`instructions/r2/core/workflows/qa-flow-api-spec-analysis.md`	0	0	0	1	0	⚠️ Warning
`instructions/r2/core/workflows/qa-flow-test-case-specification.md`	0	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-test-correction.md`	0	0	1	0	0	❌ Fail
`instructions/r2/core/workflows/qa-flow-gap-and-requirements-clarification.md`	0	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-data-collection.md`	0	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-test-implementation.md`	0	0	1	1	0	❌ Fail
`instructions/r2/core/workflows/qa-flow-execution-and-report-analysis.md`	0	0	0	0	0	✅ Pass
`instructions/r2/core/workflows/qa-flow-documentation-mcp-subflow.md`	0	0	0	3	0	⚠️ Warning
`instructions/r2/core/skills/coding/SKILL.md`	0	0	0	0	0	✅ Pass
`instructions/r2/core/skills/debugging/SKILL.md`	0	0	0	0	0	✅ Pass
`instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md`	0	0	0	0	0	✅ Pass
`instructions/r2/core/skills/requirements-authoring/SKILL.md`	0	0	0	0	0	✅ Pass
`instructions/r2/core/skills/requirements-authoring/references/authoring-catalogs.md`	0	0	0	0	0	✅ Pass
`instructions/r2/core/skills/requirements-use/SKILL.md`	0	0	0	0	0	✅ Pass
`instructions/r2/core/skills/requirements-use/references/gap-analysis-catalogs.md`	0	0	0	0	0	✅ Pass
`instructions/r2/core/skills/reverse-engineering/SKILL.md`	0	0	0	0	0	✅ Pass
`instructions/r2/core/skills/orchestrator-contract/SKILL.md`	0	1	2	0	0	❌ Fail
`instructions/r2/core/skills/orchestrator-contract/references/dispatch-template.md`	0	0	2	0	0	❌ Fail
`instructions/r2/core/skills/scenarios-generation/SKILL.md`	0	0	0	0	0	✅ Pass
`instructions/r2/core/skills/scenarios-generation/references/gwt-spec.md`	0	0	0	0	0	✅ Pass
`instructions/r2/core/skills/scenarios-generation/references/testrail-export.md`	0	0	0	1	0	⚠️ Warning
`instructions/r2/core/skills/scenarios-generation/references/testrail-format.md`	0	0	0	0	0	✅ Pass
`instructions/r2/core/skills/testing/SKILL.md`	0	0	0	0	0	✅ Pass
`instructions/r2/core/skills/testing/references/implementation-examples.md`	0	0	0	1	0	⚠️ Warning
`instructions/r2/core/skills/discovery/SKILL.md`	0	0	0	0	0	✅ Pass
`instructions/r2/core/skills/discovery/references/jira-binding.md`	0	0	0	1	0	⚠️ Warning
`instructions/r2/core/skills/discovery/references/confluence-binding.md`	0	0	0	1	0	⚠️ Warning
`instructions/r2/core/skills/discovery/references/testrail-binding.md`	0	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/testgen-flow.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-test-case-generation.md`	0	0	0	3	0	⚠️ Warning
`instructions/r3/core/workflows/testgen-flow-requirements-document-generation.md`	0	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/testgen-flow-test-case-export.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-question-generation.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/testgen-flow-gap-and-contradiction-analysis.md`	0	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/testgen-flow-data-collection.md`	0	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/testgen-flow-project-config-loading.md`	0	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/aqa-flow.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-test-implementation.md`	0	0	1	1	0	❌ Fail
`instructions/r3/core/workflows/aqa-flow-code-analysis.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-selector-identification.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-test-correction.md`	0	0	1	0	0	❌ Fail
`instructions/r3/core/workflows/aqa-flow-test-report-analysis.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-selector-implementation.md`	0	0	1	0	0	❌ Fail
`instructions/r3/core/workflows/aqa-flow-data-collection.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/aqa-flow-requirements-clarification.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow.md`	0	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/qa-flow-project-config-loading.md`	-	-	-	-	-	❌ Error
`instructions/r3/core/workflows/qa-flow-api-spec-analysis.md`	0	0	0	3	0	⚠️ Warning
`instructions/r3/core/workflows/qa-flow-test-case-specification.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-test-correction.md`	0	0	1	1	0	❌ Fail
`instructions/r3/core/workflows/qa-flow-gap-and-requirements-clarification.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-data-collection.md`	0	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/qa-flow-test-implementation.md`	0	0	0	1	0	⚠️ Warning
`instructions/r3/core/workflows/qa-flow-execution-and-report-analysis.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/workflows/qa-flow-documentation-mcp-subflow.md`	0	0	0	2	0	⚠️ Warning
`instructions/r3/core/skills/coding/SKILL.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/debugging/SKILL.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/operation-manager/SKILL.md`	0	0	0	1	0	⚠️ Warning
`instructions/r3/core/skills/requirements-authoring/SKILL.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/requirements-authoring/references/authoring-catalogs.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/requirements-use/SKILL.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/requirements-use/references/gap-analysis-catalogs.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/reverse-engineering/SKILL.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/orchestrator-contract/SKILL.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/orchestrator-contract/references/dispatch-template.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/scenarios-generation/SKILL.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/scenarios-generation/references/gwt-spec.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/scenarios-generation/references/testrail-export.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/scenarios-generation/references/testrail-format.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/testing/SKILL.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/testing/references/implementation-examples.md`	0	0	0	1	0	⚠️ Warning
`instructions/r3/core/skills/discovery/SKILL.md`	0	0	0	0	0	✅ Pass
`instructions/r3/core/skills/discovery/references/confluence-binding.md`	0	0	0	2	0	⚠️ Warning
`instructions/r3/core/skills/discovery/references/jira-binding.md`	0	0	0	2	0	⚠️ Warning
`instructions/r3/core/skills/discovery/references/testrail-binding.md`	0	0	0	2	0	⚠️ Warning
`instructions/r3/core/skills/requirements-authoring/assets/ra-requirement-unit.xml`	0	0	0	0	0	✅ Pass

📄 `instructions/r2/core/workflows/testgen-flow.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Cognitive Budget	Problem: The header block before the first phase now carries skip-gate logic, phase-load-failure handling, transition precedence, self-check criteria, a 7-item per-phase failure-routing list, and model tiers — well over the ~5-directive comfort window an agent processes reliably, all loaded up front in addition to the seven phase blocks. Reason: Front-loading routing detail that only matters when a specific phase fails enlarges the cognitive search space at the moment the agent is choosing the next phase. Solution: Push the per-phase failure-routing list and model-tier table into or the respective phase blocks so the top-of-file cognitive load is the phase sequence plus orchestration, not also a failure-routing index.
🔵 Medium	Bloat Control	Problem: The new `<workflow_phases>` preamble packs a dense multi-clause skip-gate rule, a 'Per-phase failure cases — owned by phase files' pointer list, and model-tier definitions into the header before phase 0 even starts. The skip rule single bullet ('Skip gates: only with explicit user instruction, or when testgen-state.md marks ... otherwise resume from the earliest incomplete phase. The explicit user instruction skip NEVER applies to the Phase 3 / Phase 6 HITL gates — those are rule 2 of <orchestration_and_escalation> ...') restates precedence that <orchestration_and_escalation> already owns. Reason: The same HITL-never-overridden rule is stated in the preamble bullet and again in <orchestration_and_escalation> priority (2), adding redundancy the hardening reference flags as compressible without value loss. Solution: Move the full skip-gate conditional into <orchestration_and_escalation> (which already defines the priority hierarchy) and leave a one-line pointer in the preamble, to avoid restating the HITL-override precedence in two places.
🔵 Medium	Example Grounding	Problem: The PR deletes the concrete 'Initial Prompt Formats' examples (Format 1/2/3 with literal Jira+Confluence URL samples) and the Confluence CQL example (`type=page AND space=PROJ AND text ~ 'feature'`). NEW only keeps a single inline trigger example `Analyze requirements for PROJ-123` and delegates the rest with 'input formats are enumerated in testgen-flow-project-config-loading.md step 0.1' and 'CQL search example ... the discovery skill'. Reason: BASE grounded the entry points with copy-pasteable examples; NEW relies on pointers, so grounding now depends entirely on the target files containing equivalent examples. Solution: Confirm the deleted CQL example and the three input-format samples actually exist verbatim in the cited Phase 0 step 0.1 / discovery skill; if any are absent there, the workflow lost grounded examples it used to carry. Keep at least the trigger example it retained.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	3	⬇️ Slightly worse
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Self-Validation	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/testgen-flow-test-case-generation.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Example Grounding	Problem: The inline TestRail-compatible worked example and the concrete BEFORE/AFTER merged-case example were deleted in favor of a reference to the scenarios-generation FORMAT binding plus a generic vendor-neutral fallback <tc_schema>. The fallback path (used exactly when the skill/binding is unavailable) no longer demonstrates the parameterized merged-role example. Reason: When scenarios-generation is unavailable or returns an incompatible shape, merge behavior is less grounded than BASE, risking malformed test cases. Solution: Retain one minimal concrete merged-case example inline in the fallback <tc_schema> path so the agent keeps a grounded shape when the format binding cannot be loaded.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	3	⬇️ Slightly worse
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/testgen-flow-requirements-document-generation.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Bloat Control	Problem: The new <failure_handling> ends with a 'Conscious tradeoff — why no inline per-entry fallback (declared once, not re-derived per turn)' block plus a closing paragraph explaining why the sibling testgen-flow-test-case-generation.md keeps a <tc_schema> fallback 'for a different reason'. This is rationale/justification meta-commentary (a 'Deployment guarantee' bullet citing the on-disk SKILL.md path, a 'Section contract is phase-owned' bullet) rather than operational instruction the agent must execute. Reason: pa-hardening / pa-patterns ai-issues require removing non-operational clarifications (rationale, origin labels, explanatory meta-notes); the tradeoff block restates a decision already enforced by the earlier rule and adds no executable behavior. Solution: Reduce to the single operational rule already stated earlier in the same block ('No inline per-entry fallback shape exists ... the phase blocks when the skill is unavailable; do NOT fabricate'). Drop the 'Conscious tradeoff' justification and the cross-sibling comparison, which are non-operational provenance/rationale notes.
🔵 Medium	Cognitive Budget	Problem: The phase is short (4 steps) but the <create_requirements_document> 'Section contract' table plus the testgen additions plus the SMART exemplar plus the multi-paragraph <failure_handling> tradeoff make the failure/justification prose disproportionate to the actual 4-step procedure. Reason: Surface area grows from explanatory prose, not from procedure; pa-hardening targets compact phases where directives, not rationale, dominate. Solution: Trim the justification prose (see Bloat issue) so the executable procedure dominates the file's cognitive surface rather than the meta-rationale.
🔵 Medium	Rosetta	Problem: The same block carries sibling-awareness meta-commentary: it names and reasons about another phase file ('the sibling testgen-flow-test-case-generation.md retains an inline <tc_schema> fallback for a different reason ...') and explains that sibling's internals. Reason: pa-hardening enforces no lateral/sibling awareness beyond keyword/frontmatter cues; explaining a sibling phase's design rationale exposes another phase's internals, which the boundary rules disallow. Solution: State this phase's own rule (skill is a hard dependency, block on failure) without describing or comparing against the sibling phase's fallback design.
⚪ Low	Example Grounding	Problem: BASE carried concrete worked exemplars for US, FR, and NFR (e.g. 'FR-1: Password Validation ... Minimum 8 characters'); NEW keeps only one NFR SMART exemplar inline and delegates full US/FR/NFR worked examples to requirements-authoring/references/authoring-catalogs.md. Reason: Low severity because the single retained NFR exemplar is high quality and the catalogs reference was verified to exist; flagged only to confirm the deleted US/FR examples are covered downstream. Solution: Confirm authoring-catalogs.md actually contains US and FR worked examples equivalent to the deleted ones; if so this is acceptable delegation, otherwise restore a compact US/FR exemplar.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	3	⬇️ Slightly worse
Dependency Management	5	✅ Much better

📄 `instructions/r2/core/workflows/testgen-flow-test-case-export.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/testgen-flow-question-generation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/testgen-flow-gap-and-contradiction-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	⬆️ Slightly better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/testgen-flow-data-collection.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/workflows/testgen-flow-project-config-loading.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Cognitive Budget	Problem: Several added bullets pack multiple decisions into one dense sentence, e.g. the Phase-6 recommended-skills line "`coding`, `testing` (test-implementation is done inline by this phase via `coding` + `testing`)" and the Blocking-infeasibility bullet that chains four escape options inside a single bullet with `·` separators. The orchestrator must parse nested clauses to extract the actual branch. Reason: AI reliably handles ~5 atomic steps; multi-clause bullets raise the chance a branch is skipped. Solution: Decompose the longest combined bullets (blocking-infeasibility options, per-phase skill notes) into short sub-bullets so each carries one decision, per the prompt-authoring guidance to decompose directives and keep lines short.
🔵 Medium	Reference Integrity	Problem: The new `<orchestration_and_escalation>` and `<state_file>` sections push the entire skip-refusal rule, the state-file template, and the `## Verification-Failure Overrides` audit-trail row onto external owners: "its phase-execution loop owns the skip-without-agreement / falsified-skip refusal rule ... This workflow does NOT restate that logic" and "template owned by the data-collection phase, `aqa-flow-data-collection.md`". The workflow's correct behavior on a skipped phase is now entirely non-resolvable from this file; if `orchestrator-contract` or `aqa-flow-data-collection.md` does not define exactly that rule/row, the escalation contract silently breaks. Reason: Cross-file ownership with zero local fallback makes a safety-critical rule (refusing falsified phase skips) depend on a reference resolving correctly at runtime. Solution: Keep the delegation but add a one-line fallback assertion in this file (e.g. the minimal skip-refusal behavior and the required state rows) so the workflow degrades safely if the referenced owners drift, and confirm `aqa-flow-data-collection.md` actually defines `<state_file_template>` and `## Verification-Failure Overrides`.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow-test-implementation.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Structural Coherence	Problem: The file ends with a stray, unmatched `</output>` closing tag after the workflow root close tag (open=0, close=1 verified by grep). It has no matching `<output>` opener and is not part of the prompt's structure. Reason: The orphaned tag is injected verbatim into the agent's context when the workflow is loaded, adding confusing junk that can mislead XML-structure parsing; it is a systematic copy/paste artifact across 10 files in this PR. Solution: Delete the trailing `</output>` line so the document terminates at its root close tag.
🔵 Medium	Example Grounding	Problem: The rewrite deleted all 10 concrete worked examples that grounded the abstract tasks (BASE Tasks 1-10 showed real test structure, setup, assertions, cleanup, e.g. `test('should display correct welcome message after login', async ({ page }) => { ... }` and the full Phase-6 test-plan markdown block). NEW keeps only one abstract state-file example (`tests/e2e/checkout/refund.spec.ts`). The actual authoring instruction "Author the test using page-object methods only (no raw selectors in test code), proper waits, project assertion style" now has no positive example of what compliant output looks like. Reason: The deleted examples satisfied the Example Grounding gate; removing them lowers grounding for the core authoring directive (per spec, deleted gate-satisfying content scores comparison<3). Solution: Re-add one short, framework-neutral positive example of a compliant test skeleton (page-object call + assertion + wait) so the authoring contract has a concrete anchor without re-introducing hardcoded Playwright. Keep it minimal to preserve the bloat win.
⚪ Low	Reference Integrity	Problem: Behavior is delegated to skill modes that exist only as cross-references: "`testing` — UI impl mode" and "`coding` (standards-first mode)". The phase OWNS the contract but the actual authoring mechanics live in those skill modes; if `testing`/`coding` do not expose those named modes the phase cannot author anything. Reason: Named-mode references must resolve for the phase's USE SKILL steps to function. Solution: Confirm `testing` and `coding` SKILL.md define the referenced modes (UI impl mode / standards-first mode); if mode names are aspirational, soften to a capability description rather than a named mode.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	3	⬇️ Slightly worse
Example Grounding	3	⬇️ Slightly worse
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow-code-analysis.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Reference Integrity	Problem: The BASE referenced the input path as `agents/user-app/project_description.md`; the NEW input_contract table relocates it to `project_description.md` (repo root) with no migration note, while the parent `aqa-flow.md` and sibling phases reference `CONTEXT.md`/`ARCHITECTURE.md`/`IMPLEMENTATION.md` as the authoritative docs. `project_description.md` is not a Rosetta predefined target file (per pa-rosetta.md the canonical docs are CONTEXT/ARCHITECTURE/IMPLEMENTATION). The contract leans on a non-canonical filename whose location silently changed. Reason: An input path that is non-canonical and silently relocated risks the GATE check ("project description OR one authoritative repo doc exists") passing/failing inconsistently across phases. Solution: Either justify `project_description.md` as an AQA-domain artifact and define where it is created, or fold its role into the canonical `CONTEXT.md`/`ARCHITECTURE.md`/`IMPLEMENTATION.md` already listed in the table, so the input contract uses resolvable Rosetta-canonical references.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow-selector-identification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow-test-correction.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Structural Coherence	Problem: The file ends with a stray, unmatched `</output>` closing tag after the workflow root close tag (open=0, close=1 verified by grep). It has no matching `<output>` opener and is not part of the prompt's structure. Reason: The orphaned tag is injected verbatim into the agent's context when the workflow is loaded, adding confusing junk that can mislead XML-structure parsing; it is a systematic copy/paste artifact across 10 files in this PR. Solution: Delete the trailing `</output>` line so the document terminates at its root close tag.
🔵 Medium	Reference Integrity	Problem: The NEW input/in-scope contract references `agents/plans/aqa-<test-name>-failure-analysis.md` as the Phase 7 failure-analysis artifact (both in `<workflow_context>` and the `<correction_contract>` binding). The parent `aqa-flow.md` Phase 7 entry states the Phase 7 output only as "failure analysis with root causes and fix recommendations" without naming that exact path, and the BASE correction file read the analysis "from test plan". If Phase 7 (`aqa-flow-test-report-analysis.md`) does not write to exactly `aqa-<test-name>-failure-analysis.md`, step 8.1's `coding` binding (proposed-change source) points at a non-existent file. Reason: A cross-phase input path that only one side declares can break the Phase 8 apply step when the artifact is absent or named differently. Solution: Verify `aqa-flow-test-report-analysis.md` writes the failure analysis to the same `agents/plans/aqa-<test-name>-failure-analysis.md` path, or align both files to one agreed artifact name so the Phase 7 to Phase 8 handoff path is identical on both ends.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	3	⬇️ Slightly worse
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow-test-report-analysis.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Example Grounding	Problem: The NEW version deletes all concrete per-failure markdown templates and the categorized-failure example block that BASE carried (e.g. the BASE `### Failure: [Test Name]` block with Error Type / Page Source File / Selector Used / Actual Element Structure fields, and the Performance Analysis template). NEW replaces them with an abstract field list in `<failure_analysis_contract>` ("Failure name / Error type / Root cause / Evidence label / Evidence rationale / Recommendation") and no filled example of an analyzed failure entry. Reason: The six-field contract is new and the Evidence label / rationale fields are error-prone; with no canonical example the agent must invent the shape, increasing inconsistency across failures despite the contract being machine-checked by the validation checklist. Solution: Add one short filled-in example failure entry (a selector error with a cited page-source line and an Evidence label) to ground the six-field contract, matching the worked-example style used in the sibling requirements-clarification phase.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	3	⬇️ Slightly worse
Safety Boundaries	5	✅ Much better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	⬆️ Slightly better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow-selector-implementation.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Structural Coherence	Problem: NEW contains a stray dangling `</output>` tag at line 86, after the closing `</aqa_flow_selector_implementation>` root tag. The prompt body is wrapped in `<aqa_flow_selector_implementation>...</aqa_flow_selector_implementation>` but there is no matching opening `<output>`, so the trailing `</output>` is an orphaned tag. Reason: An unmatched closing tag breaks the XML-style structural framing the rest of the AQA phases rely on; it can confuse tag-aware parsing/compaction and signals a copy-paste error, undermining the otherwise clean section boundaries. Solution: Delete the stray `</output>` line at the end of the file so the document terminates cleanly at the `</aqa_flow_selector_implementation>` close tag.
🔵 Medium	Reference Integrity	Problem: The stray `</output>` tag at line 86 references a sectioning element (`<output>`) that is never opened anywhere in the file, so the reference does not resolve. Reason: A closing tag with no opener is a dangling reference within the prompt's own structure; even though it does not point to an external file, it is an unresolved structural token introduced by this diff. Solution: Remove the orphaned tag (same fix as the Structural Coherence issue); confirm no `<output>` open tag was intended.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	3	⬇️ Slightly worse
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow-data-collection.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Bloat Control	Problem: The added `<workflow_context>` vendor-resolution block enumerates three-or-four-deep config-key fallback chains twice in prose (e.g. "first non-empty key (stop at first hit): `tms_mcp_collection_skill`, `tms_collection_skill`, `test_case_management.mcp_collection_skill`" and the parallel documentation-vendor list `documentation_mcp_collection_skill`, `documentation.mcp_collection_skill`, `mcp_documentation_collection_skill`, `confluence_mcp_collection_skill`), and the same resolution is then restated again inside `<gather_testrail>` step 1 and `<gather_confluence>` <acquire_skills> step 1. Reason: The duplicated multi-key fallback prose inflates a Phase 1 collector phase and competes for attention with the actual collection steps, a redundancy the hardening reference flags (DRY / remove duplication). Solution: State each vendor's config-key fallback chain once in `<workflow_context>` and have the step bodies reference it by name (e.g. 'resolve TMS vendor binding per <workflow_context>') instead of repeating the in-scope signal description.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/aqa-flow-requirements-clarification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	4	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r2/core/workflows/qa-flow.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Bloat Control	Problem: The workflow-phases and skip-rules blocks repeat the same ownership-attribution boilerplate many times — e.g. "is owned by the `orchestrator-contract` skill (per `<references>`)", "not restated here", "(Generic verify-before-advance is owned by `orchestrator-contract`.)", "Gate-execution mechanics ... are owned by `USE SKILL hitl` — defer to it; not restated here.". The same defer-to-skill clarification recurs in `<phase_template>`, `<skip_rules>`, the phase-output-gate bullet, and `<failure_handling>`. Reason: pa-hardening core_principles flag DRY and 'Avoid filler text / Remove non-operational clarifications'; the repeated parenthetical attributions add cognitive load without adding behavior. Solution: State the ownership boundary once (e.g., a single line: cadence + gate mechanics owned by orchestrator-contract/hitl) and drop the per-block re-statements; rely on the reader to carry it forward.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/workflows/qa-flow-project-config-loading.md`

Error: Prompt too large for reliable evaluation: instructions/r2/core/workflows/qa-flow-project-config-loading.md

📄 `instructions/r2/core/workflows/qa-flow-api-spec-analysis.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Workflow Completeness	Problem: Phase 2 (api-spec-analysis) auto-advances to Phase 3 with no HITL gate and is not marked type="HITL", unlike the analogous gated testgen/aqa data phases. The only guard is a weak 'file present, non-placeholder' check, so the agent can proceed on a thin/incorrect api-analysis.md. Reason: Without a confirmation gate the agent silently builds downstream test cases on unreviewed API analysis, breaking parity with the gated sibling flows and weakening HITL coverage. Solution: Either add a lightweight verify-before-advance confirmation after Phase 2 (and Phase 1), or document in qa-flow.md why API-spec extraction is intentionally trusted to auto-advance while sibling data phases are gated.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/workflows/qa-flow-test-case-specification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/workflows/qa-flow-test-correction.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Structural Coherence	Problem: The file ends with a stray, unmatched `</output>` closing tag after the workflow root close tag (open=0, close=1 verified by grep). It has no matching `<output>` opener and is not part of the prompt's structure. Reason: The orphaned tag is injected verbatim into the agent's context when the workflow is loaded, adding confusing junk that can mislead XML-structure parsing; it is a systematic copy/paste artifact across 10 files in this PR. Solution: Delete the trailing `</output>` line so the document terminates at its root close tag.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/workflows/qa-flow-gap-and-requirements-clarification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/workflows/qa-flow-data-collection.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	5	✅ Much better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/workflows/qa-flow-test-implementation.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Structural Coherence	Problem: The file ends with a stray, unmatched `</output>` closing tag after the workflow root close tag (open=0, close=1 verified by grep). It has no matching `<output>` opener and is not part of the prompt's structure. Reason: The orphaned tag is injected verbatim into the agent's context when the workflow is loaded, adding confusing junk that can mislead XML-structure parsing; it is a systematic copy/paste artifact across 10 files in this PR. Solution: Delete the trailing `</output>` line so the document terminates at its root close tag.
🔵 Medium	Rosetta	Problem: `<stop_for_execution>` embeds a full anti-bypass HITL policy directly in the phase: "User instruction to bypass this gate must be refused with citation of this rule... the gate is mechanical and cannot be overridden by instruction alone." pa-hardening states user involvement / HITL is canonically owned by `bootstrap-hitl-questioning.md` and a phase should point to the canonical HITL home via a `type=` marker, "never a parallel mechanism." Reason: A second, self-contained HITL mechanism risks drift from the canonical HITL authority and duplicates approval-governance logic the family already centralizes. Solution: Keep the STOP/WAIT gate but reference the canonical `hitl` skill for the refusal/override-vocabulary semantics instead of restating a self-contained bypass-refusal policy inside the phase, matching how the sibling Phase 6 and parent qa-flow delegate gate mechanics to `hitl`.

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/workflows/qa-flow-execution-and-report-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/workflows/qa-flow-documentation-mcp-subflow.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Decision Branching	Problem: Config advertises documentation_type values google-drive/local, but only the Confluence backend has a resolvable discovery binding and this subflow only maps 'Confluence backend -> binding confluence'. A google-drive/local config has no retrieval path and silently degrades to SKIPPED_NO_CONFIG while the user believes docs are wired. Reason: Silent degradation hides a misconfiguration: the agent skips documentation the user expected to be ingested, reducing requirement coverage without warning. Solution: Either constrain documentation_type's enum to backends that have a binding, or explicitly warn 'documentation_type has no retrieval binding — docs will be skipped' so the unsupported value is surfaced.
🔵 Medium	Bloat Control	Problem: Config-key handling is stated twice: once narratively in `<workflow_context>` ("resolve the documentation vendor binding from whichever of these fields exists first") and again procedurally in `<resolve>` step 1 ("the first non-empty config key per `<workflow_context>` precedence list"), with the Confluence-backend mapping repeated in both places. Reason: Duplicated resolution logic across two sections can drift on edit and inflates the prompt without adding decision value. Solution: Keep the precedence list as the single source in `<workflow_context>` and let `<resolve>` reference it by pointer only (per pa-hardening SSoT rule: mark canonical home once, elsewhere a `→` pointer), removing the duplicated Confluence-mapping clause.
🔵 Medium	Cognitive Budget	Problem: The second `<workflow_context>` bullet ("Config keys (read literally...)") packs vendor-binding precedence (four key names with stop-at-first-hit), the always-discovery mapping rule, and a long open-ended in-scope-signal enumeration (`documentation_type`, `type`, `confluence_base_url`, `confluence_space`, `documentation_base_url`, `documentation_mcp_server`, "or any field your `qa-project-config` template documents") into a single run-on bullet. Reason: Bundling multiple independent decision inputs into one dense sentence raises the per-step cognitive search space and risks the agent missing the stop-at-first-hit precedence or the absent-means-absent caveat. Solution: Split the resolution precedence list, the discovery-mapping rule, and the in-scope signal set into separate sub-bullets or a small table so each decision input is atomic; the `<resolve>` steps already reference them, so a structured form would not add length.

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/skills/coding/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/debugging/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/requirements-authoring/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/requirements-authoring/references/authoring-catalogs.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	4	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/requirements-use/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	4	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/requirements-use/references/gap-analysis-catalogs.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/reverse-engineering/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	⬆️ Slightly better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/orchestrator-contract/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🟠 Very High	Reference Integrity	Problem: The added `<core_concepts>` states "WORKFLOW LOADING is a separate canonical concern owned by `load-workflow`" and `<resources>` lists "skill `load-workflow` — canonical workflow loading", and `<prerequisites>` requires "OPERATION_MANAGER is active". No `load-workflow` or `operation-manager` skill exists anywhere under instructions/r2 (verified: not in instructions/r2/core/skills/ and no `name: load-workflow`/`name: operation-manager` frontmatter in r2). These are dangling canonical references. Reason: Per pa-rosetta.md, Rosetta prompts must reference prompts by logical name from the canonical `docs/definitions/.md` lists and a missing name requires an explicit user question. An agent following the `<prerequisites>` gate ("OPERATION_MANAGER is active") or attempting `USE SKILL load-workflow` will fail to resolve them, breaking the dispatch/phase-drive chain. Solution:* Point these references to skills that actually exist in r2 (e.g. the established `plan-manager`/`planning` for drive-loop concerns, or define and add the `load-workflow`/`operation-manager` skills to the canonical skills list before referencing them as authorities), or inline the loading/operation-manager responsibility instead of delegating to a non-existent owner.
🟡 High	Rosetta	Problem: Same dangling `load-workflow`/`operation-manager` references violate the Rosetta definitions policy ('Use names from docs/definitions/.md', 'Missing name: ask explicit user question', 'Do not auto-add out-of-list items'). Reason:* pa-rosetta.md mandates referencing only canonical Rosetta prompt names; inventing authority owners that do not exist degrades the Rosetta gate. Solution: Reconcile referenced skill names against the canonical Rosetta skills definitions before merge.
🟡 High	Dependency Management	Problem: `<prerequisites>` adds a hard gate "OPERATION_MANAGER is active" and `<core_concepts>`/`<resources>` make the skill depend on `load-workflow`, but neither dependency is provided or resolvable in r2. The skill now cannot satisfy its own stated prerequisites. Reason: pa-hardening.md requires no gaps/ambiguity and logical consistency within a prompt and its DIRECT dependencies; an unresolvable hard prerequisite makes the contract impossible to honor. Solution: Either declare these as optional/soft dependencies with a fallback when the owning skill is absent, or wire them to existing canonical skills so the dependency graph closes.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Reference Integrity	2	⬇️ Slightly worse
Structural Coherence	5	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	5	✅ Much better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	3	⬇️ Slightly worse
Rosetta	3	⬇️ Slightly worse

📄 `instructions/r2/core/skills/orchestrator-contract/references/dispatch-template.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Rosetta	Problem: The template hard-codes `MUST USE SKILL ... operation-manager` into every dispatch prompt, but the `operation-manager` skill does not exist anywhere under instructions/r2 (it exists only in r3). Every subagent spawned via this template is told to load a missing skill. Reason: pa-rosetta requires references to resolve within the release. A baked-in missing-skill load makes every r2 subagent dispatch start with a failed ACQUIRE, undermining the dispatch chain. Solution: Remove `operation-manager` from the r2 dispatch template (or add the skill to r2). Keep references limited to skills that exist in r2.
🟡 High	Reference Integrity	Problem: The new template hard-codes "MUST USE SKILL `subagent-contract`, `operation-manager`." into every dispatch prompt, but `operation-manager` is not a skill that exists in instructions/r2 (only `subagent-contract` resolves). Every subagent dispatched with this template will be told to load a non-existent skill. Reason: pa-rosetta.md requires referencing only canonical Rosetta prompt names; a MUST directive to load a missing skill propagates a broken instruction to every spawned subagent. Solution: Reference only resolvable canonical skills (subagent-contract) and remove or replace `operation-manager` with the actual skill name once it exists in the canonical skills list.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/scenarios-generation/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/skills/scenarios-generation/references/gwt-spec.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/skills/scenarios-generation/references/testrail-export.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Dependency Management	Problem: The vendor MCP tool names are hardcoded as bare identifiers throughout the process, e.g. `call \\`mcp_testrail_get_project(project_id)\`` (step 1) and `mcp_testrail_add_case(section_id, title, priority_id, type_id, refs, custom_steps_separated)` (step 8). pa-rosetta.md requires Rosetta prompts to be coding-agent-agnostic and pa-hardening.md says no hardcoded tool names; here the concrete TestRail MCP signatures are baked into the binding file. Reason: Vendor names live in a config-resolved binding file by design and the fork table makes them swappable, so this is a minor, intentional containment rather than a portability regression. Solution: This is acceptable as the lowest layer (a vendor-specific binding explicitly named testrail-export.md whose whole purpose is to hold the TestRail specifics, and the SKILL keeps the vendor abstraction), and the 'Swapping to another TMS vendor' table parameterizes every tool name for forks. Keep, but ensure the SKILL/PHASE never reaches these names except through the resolved EXPORT binding, so vendor-agnosticism is preserved at the skill boundary.

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/skills/scenarios-generation/references/testrail-format.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/skills/testing/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better

📄 `instructions/r2/core/skills/testing/references/implementation-examples.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Reference Integrity	Problem: Load-bearing rules (ATC traceability, assertion-priority, exact UI/selector/API code shapes) now live ONLY in this lazy-loaded reference, and the aqa/qa phase files that consume the testing skill do not name implementation-examples.md directly (chain: phase -> testing SKILL.md -> reference). Reason: An agent that under-loads the lazy reference loses the exact output shape and the fragile-selector approval rule, producing ungrounded test code. Solution: Keep the load instruction imperative and restate the 1-2 truly load-bearing invariants (ATC traceability, no-silent-fragile-selector) as a short inline guard in testing/SKILL.md so they survive if the reference is not loaded.

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r2/core/skills/discovery/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r2/core/skills/discovery/references/jira-binding.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Structural Coherence	Problem: The file ends with a dangling, unmatched closing XML tag `</content>` (last line) with no corresponding opening `<content>` tag anywhere in the file; the file is otherwise pure markdown (`#` headers, tables, fenced blocks). Reason: An orphan closing tag is a structural artifact that gets loaded verbatim into agent context on ACQUIRE; it can confuse XML-aware parsing and signals a copy/paste leftover, undermining the clean section structure the binding otherwise has. Solution: Remove the stray closing `</content>` line so the file is consistently markdown with no orphan XML tag.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/discovery/references/confluence-binding.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Structural Coherence	Problem: The file ends with a dangling, unmatched closing XML tag `</content>` (last line) with no opening `<content>` tag; the body is otherwise pure markdown. Reason: The stray tag is loaded into agent context verbatim on ACQUIRE and is a copy/paste leftover that breaks the otherwise-clean markdown structure and may mislead XML-aware parsing. Solution: Delete the orphan `</content>` final line.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r2/core/skills/discovery/references/testrail-binding.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Structural Coherence	Problem: The file ends with a dangling, unmatched closing XML tag `</content>` (last line) with no opening `<content>` tag; the file is otherwise pure markdown. Reason: The orphan tag is ACQUIRE'd into agent context as-is, is a copy/paste leftover, and breaks the clean markdown structure the binding otherwise maintains. Solution: Remove the orphan `</content>` final line.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	✅ Much better
Single Responsibility	5	✅ Much better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	4	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/testgen-flow.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	4	✅ Much better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/testgen-flow-test-case-generation.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Example Grounding	Problem: The inline TestRail-compatible worked example and the concrete BEFORE/AFTER merged-case example were deleted in favor of a reference to the scenarios-generation FORMAT binding plus a generic vendor-neutral fallback <tc_schema>. The fallback path (used exactly when the skill/binding is unavailable) no longer demonstrates the parameterized merged-role example. Reason: When scenarios-generation is unavailable or returns an incompatible shape, merge behavior is less grounded than BASE, risking malformed test cases. Solution: Retain one minimal concrete merged-case example inline in the fallback <tc_schema> path so the agent keeps a grounded shape when the format binding cannot be loaded.
🔵 Medium	Reference Integrity	Problem: Step 5.3 (added) instructs loading `references/<vendor>-format.md` via the resolved vendor binding, but the `scenarios-generation` skill ships only `testrail-format.md` / `testrail-export.md`. For any resolved vendor other than `testrail`, the referenced `<vendor>-format.md` does not exist, so the ACQUIRE would return zero documents. Reason: A reference that resolves only for one vendor while the prompt implies an open vendor set is a latent dangling-reference; the inline `<tc_schema>` fallback mitigates breakage but the path is still mis-advertised. Solution: Note that only the `testrail` vendor binding currently has reference assets, or constrain the resolvable vendor set to those with shipped reference files, so the parameterized reference always resolves.
🔵 Medium	Precision & Explicitness	Problem: After the refactor parameterizes the test format to a config-resolved vendor binding (`scenarios-generation` with `references/<vendor>-format.md`, `the resolved FORMAT-binding case format`), several changed lines still hardcode the term "TestRail format": `<phase_steps>` line 3 `Generate test cases in TestRail format` and the `<create_test_document>` placeholder `[TC entries in the resolved FORMAT-binding case format]` coexisting with the residual title. One concept (the case format) is now named two ways, weakening the one-term-per-concept discipline the rest of the file establishes. Reason: Mixed naming for the same concept can make an agent treat 'TestRail' as a hard requirement even when the config resolves a different TMS vendor, contradicting the file's own config-resolution rule. Solution: Make the format term consistent with the vendor-binding abstraction the file otherwise uses (refer to the resolved FORMAT binding rather than naming TestRail) in the `<phase_steps>` step-3 line so the parameterization reads uniformly.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	3	⬇️ Slightly worse
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	✅ Much better
Cognitive Budget	4	✅ Much better
Dependency Management	4	⬆️ Slightly better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/testgen-flow-requirements-document-generation.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Bloat Control	Problem: The added `<failure_handling>` block carries a multi-paragraph meta-justification section, Conscious tradeoff — why no inline per-entry fallback (declared once, not re-derived per turn): with three bulleted sub-points (skill-is-hard-dependency, deployment guarantee, section-contract-is-phase-owned) plus a closing paragraph contrasting the sibling `testgen-flow-test-case-generation.md`. This is rationale/provenance prose explaining why a design decision was made rather than an operational instruction the agent must execute, which pa-hardening flags as removable ('Remove non-operational clarifications (history, rationale, ...), provenance, or explanatory meta-notes'). Reason: Per-turn re-sent rationale consumes cognitive budget and context window without altering execution, and the same operational outcome is already stated in the 'Skill execution failure' bullet above it. Solution: Reduce the tradeoff explanation to the one operational rule the agent needs (skill failure blocks the phase; no inline fallback exists; re-invoke once then halt) and drop the design-rationale paragraphs that do not change agent behavior.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	3	⬆️ Slightly better
Cognitive Budget	4	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/testgen-flow-test-case-export.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	4	✅ Much better
Cognitive Budget	4	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow-question-generation.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/testgen-flow-gap-and-contradiction-analysis.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Example Grounding	Problem: The BASE file carried concrete detection examples inline (e.g. value-mismatch: 'Priority: Jira says "High", Confluence says "Low priority"'; logic-conflict: '"Must be fast" AND "Must show detailed calculations"'). The NEW file deletes all of these and delegates them to the `requirements-use` gap_analysis mode's catalogs, keeping only one vague-vs-specific example row in the document-contract. A reader of this phase alone now sees the taxonomy names without grounded probes. Reason: The lost examples are recoverable via the resolvable `requirements-use` reference, so this is a minor standalone-readability loss, not a behavioral regression. Solution: Acceptable as delegation since `requirements-use/references/gap-analysis-catalogs.md` resolves and owns the catalogs; if any standalone usability is desired, keep one short illustrative probe per category as an inline pointer to the catalog. No prompt rewrite needed if delegation is the intended boundary.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	⬆️ Slightly better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/testgen-flow-data-collection.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Example Grounding	Problem: BASE contained concrete procedural detail now removed and delegated: the CQL template (`type=page AND space={PROJECT_KEY} AND (text ~ "{term1}" ...)`), pseudo-call signatures, and the explicit child-page traversal loop. NEW delegates all of this to `discovery`'s `confluence-binding.md` ('owns URL parsing, direct-URL-vs-search precedence, child-page traversal...'). A reader of this phase alone no longer sees the search/traversal mechanics. Reason: The deleted procedural examples are recoverable via the resolvable `discovery` confluence/jira bindings, and the hardcoded `mcp_Jira_MCP_` names were intentionally removed to satisfy Dependency Management (config-resolved vendors), which is a net improvement. Solution:* Acceptable delegation: `discovery/references/confluence-binding.md` and `jira-binding.md` both resolve, and `<pitfalls>` still names child-page traversal as a MUST. No rewrite needed; the mechanics correctly live in the binding the phase invokes.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/workflows/testgen-flow-project-config-loading.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Reference Integrity	Problem: The `<state_file_template>` `## Phase Details` example shows only `### Phase 1` (with 'Add sections for each completed phase' hook) even though this is Phase 0 and the validation_checklist requires 'testgen-state.md created with Phase 0 marked complete'. The canonical template the file itself authors illustrates a downstream phase rather than the Phase 0 detail row it must write here, a small self-consistency gap in the owned template. Reason: The template otherwise resolves correctly and is referenced by sibling phases; the example-vs-required-output mismatch is cosmetic and unlikely to break execution, hence low severity. Solution: Show a `### Phase 0` example detail row in the `<state_file_template>` `## Phase Details` block (or relabel the placeholder generically) so the template demonstrates the row this phase actually appends. No behavioral logic change required.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	✅ Much better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Dependency Management	5	⬆️ Slightly better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-test-implementation.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Structural Coherence	Problem: The file ends with a stray, unmatched `</output>` closing tag after the workflow root close tag (open=0, close=1 verified by grep). It has no matching `<output>` opener and is not part of the prompt's structure. Reason: The orphaned tag is injected verbatim into the agent's context when the workflow is loaded, adding confusing junk that can mislead XML-structure parsing; it is a systematic copy/paste artifact across 10 files in this PR. Solution: Delete the trailing `</output>` line so the document terminates at its root close tag.
🔵 Medium	Example Grounding	Problem: The rewrite deletes ALL concrete code examples that grounded the abstract instructions in BASE (e.g. the full `test('should display correct welcome message after login', async ({ page }) => {...})` setup/action/assertion snippets and the import-structure template). NEW gives one state-file markdown example but zero example of an authored test or of a `### Uncovered Assertions` entry, even though it mandates 'every Phase 2 assertion implemented OR recorded'. Reason: Example Grounding gate requires abstract instructions be grounded with concrete examples; the contract rules ('no raw selectors', 'Uncovered Assertions reason format') are now stated only abstractly. This is an intentional portability/bloat trade-off, hence net-positive elsewhere, but the grounding gate specifically regressed. Solution: Add one small, framework-neutral worked example of an Uncovered-Assertions entry and/or a minimal page-object-method-based test snippet illustrating the 'no raw selectors in test code' rule, without re-baking Playwright specifics.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	3	⬇️ Slightly worse
Example Grounding	3	⬇️ Slightly worse
Safety Boundaries	5	✅ Much better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/workflows/aqa-flow-code-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/workflows/aqa-flow-selector-identification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/workflows/aqa-flow-test-correction.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Structural Coherence	Problem: The file ends with a stray, unmatched `</output>` closing tag after the workflow root close tag (open=0, close=1 verified by grep). It has no matching `<output>` opener and is not part of the prompt's structure. Reason: The orphaned tag is injected verbatim into the agent's context when the workflow is loaded, adding confusing junk that can mislead XML-structure parsing; it is a systematic copy/paste artifact across 10 files in this PR. Solution: Delete the trailing `</output>` line so the document terminates at its root close tag.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	3	⬇️ Slightly worse
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	✅ Much better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	✅ Much better

📄 `instructions/r3/core/workflows/aqa-flow-test-report-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Dependency Management	5	⬆️ Slightly better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-selector-implementation.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Structural Coherence	Problem: The PR introduces a stray, unmatched closing tag `</output>` as the final line of the file (line 86), AFTER the root element close `</aqa_flow_selector_implementation>` (line 84). The tag has no opening counterpart anywhere in the file. It is NOT present in the BASE version and is unique to this file among the four (the other three sibling phase files have no `</output>` tag). Reason: A dangling XML close tag with no opener corrupts the structural integrity of the prompt. When the phase is ACQUIRE'd and injected into an agent's context, the orphan tag can confuse XML-aware parsing, mislead the agent about where the phase body ends, and is exactly the kind of unmatched-tag defect the audit was told to look for at the end of this file. Solution: Delete the trailing `</output>` line so the file ends cleanly at the root close `</aqa_flow_selector_implementation>`. Most likely a transcription/paste artifact from a tool-output wrapper that leaked into the saved file.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	✅ Much better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	3	⬇️ Slightly worse
Example Grounding	5	✅ Much better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-data-collection.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	✅ Much better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Dependency Management	5	✅ Much better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/aqa-flow-requirements-clarification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	4	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	✅ Much better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	✅ Much better
Example Grounding	5	✅ Much better
Safety Boundaries	4	⬆️ Slightly better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Bloat Control	Problem: Several blocks carry meta-narration about ownership boundaries rather than operational instruction, e.g. `(Source-system + tool enumeration owned by the frontmatter \\`description\ `field — not restated here.)` and `The ACQUIRE / execute / state-update cadence is the \\`orchestrator-contract\ `skill's contract, not restated per-phase.` These provenance/ownership annotations repeat across `<workflow_phases>`, `<phase_template>`, `<skip_rules>`, and `<state_file>`. Reason: pa-hardening.md flags non-operational clarifications and provenance/meta-notes for removal; the repeated ownership prose inflates the prompt without changing agent behavior. Solution: Compress the repeated 'owned by X, not restated here' disclaimers into a single one-line ownership note at the top of the file instead of repeating the pattern in each block.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow-project-config-loading.md`

Error: Prompt too large for reliable evaluation: instructions/r3/core/workflows/qa-flow-project-config-loading.md

📄 `instructions/r3/core/workflows/qa-flow-api-spec-analysis.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Workflow Completeness	Problem: Phase 2 (api-spec-analysis) auto-advances to Phase 3 with no HITL gate and is not marked type="HITL", unlike the analogous gated testgen/aqa data phases. The only guard is a weak 'file present, non-placeholder' check, so the agent can proceed on a thin/incorrect api-analysis.md. Reason: Without a confirmation gate the agent silently builds downstream test cases on unreviewed API analysis, breaking parity with the gated sibling flows and weakening HITL coverage. Solution: Either add a lightweight verify-before-advance confirmation after Phase 2 (and Phase 1), or document in qa-flow.md why API-spec extraction is intentionally trusted to auto-advance while sibling data phases are gated.
🔵 Medium	Bloat Control	Problem: The redaction guidance is largely restated in two places: `<redaction_contract>` defines targets and a re-scan grep list, and `<validation_checklist>` re-asserts `Redaction scan ran per <redaction_contract>` while the per-endpoint `Notes / Discrepancies` field also says 'record each applied redaction here.' The same redaction obligation is expressed three times. Reason: pa-hardening.md DRY/compress guidance: repeating the same obligation across contract, template, and checklist adds words without adding control. Solution: State the redaction obligation once in `<redaction_contract>` and have the checklist and template fields reference it by name without re-describing what to record.
🔵 Medium	Cognitive Budget	Problem: The phase carries two large inline worked examples in one file — the full `<endpoint_contract_template>` blank template AND a complete 'Worked entry' (`GET /api/v1/orders/{orderId}` with 4 response rows, citations, discrepancies) — plus the `<redaction_contract>` with a full re-scan grep list. At 14.4K chars this single phase loads a heavy template-plus-example surface for the discoverer subagent to hold while extracting contracts. Reason: pa-hardening.md sets a <300-line ideal / 500 acceptable size target and warns AI feels overloaded past ~5 directives; duplicating a full template as both blank and filled doubles the cognitive surface. Solution: Keep the blank `<endpoint_contract_template>` and trim the worked entry to the minimal discriminating fields (one response row + the discrepancy note), since the discrepancy is the only thing the example uniquely teaches.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow-test-case-specification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow-test-correction.md`

⚠️ Issues Found

Severity	Gate	Details
🟡 High	Structural Coherence	Problem: The file ends with a stray, unmatched `</output>` closing tag after the workflow root close tag (open=0, close=1 verified by grep). It has no matching `<output>` opener and is not part of the prompt's structure. Reason: The orphaned tag is injected verbatim into the agent's context when the workflow is loaded, adding confusing junk that can mislead XML-structure parsing; it is a systematic copy/paste artifact across 10 files in this PR. Solution: Delete the trailing `</output>` line so the document terminates at its root close tag.
🔵 Medium	Bloat Control	Problem: The iteration-cap + escalation rule is stated three times: `<correction_contract>` ('cap in-phase apply retries at 3 cycles per failing change... record `Phase 7 blocked: in-phase apply retry cap reached`'), `<present_for_approval>` step 3a (re-prompt cap), and `<apply_changes>` step 4 ('Max retries: cap step 7.3 in-phase retries at 3 cycles... record `Phase 7 blocked: in-phase apply retry cap reached`'). Same cap, threshold, and state string duplicated verbatim. Reason: pa-hardening.md DRY/compress: the identical cap and `Phase 7 blocked` string appearing in three blocks is redundancy that can drift if one copy is edited. Solution: Define the 3-cycle apply-retry cap and its exact state-note string once in `<correction_contract>` and reference it from `<apply_changes>` step 4 instead of restating it.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow-gap-and-requirements-clarification.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Cognitive Budget	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow-data-collection.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Bloat Control	Problem: The 'phase OWNS the contract, skills EMIT into it' ownership statement is restated three times across the file: `<workflow_context>` ('This phase OWNS the raw-data aggregation contract... EMIT into the sections this phase asserts'), `<raw_data_contract>` ('discovery and reverse-engineering emit into these, they do not define them'), and `<phase_steps>`. The emit/own framing repeats without adding new control. Reason: pa-hardening.md DRY/compress and 'remove non-operational clarifications': the repeated ownership meta-framing is provenance prose, not an actionable directive. Solution: State the own/emit relationship once in `<raw_data_contract>` and drop the duplicated framing from `<workflow_context>` and `<phase_steps>`.

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Cognitive Budget	4	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/workflows/qa-flow-test-implementation.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Structural Coherence	Problem: The file ends with a stray, unmatched `</output>` closing tag after the workflow root close tag (open=0, close=1 verified by grep). It has no matching `<output>` opener and is not part of the prompt's structure. Reason: The orphaned tag is injected verbatim into the agent's context when the workflow is loaded, adding confusing junk that can mislead XML-structure parsing; it is a systematic copy/paste artifact across 10 files in this PR. Solution: Delete the trailing `</output>` line so the document terminates at its root close tag.

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/workflows/qa-flow-execution-and-report-analysis.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/workflows/qa-flow-documentation-mcp-subflow.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Decision Branching	Problem: Config advertises documentation_type values google-drive/local, but only the Confluence backend has a resolvable discovery binding and this subflow only maps 'Confluence backend -> binding confluence'. A google-drive/local config has no retrieval path and silently degrades to SKIPPED_NO_CONFIG while the user believes docs are wired. Reason: Silent degradation hides a misconfiguration: the agent skips documentation the user expected to be ingested, reducing requirement coverage without warning. Solution: Either constrain documentation_type's enum to backends that have a binding, or explicitly warn 'documentation_type has no retrieval binding — docs will be skipped' so the unsupported value is surfaced.
🔵 Medium	Bloat Control	Problem: The single `<workflow_context>` `Config keys` bullet (line 17) packs vendor-binding precedence (4 keys), the discovery-skill mapping, AND a 7+ item in-scope-signal enumeration into one dense run-on sentence spanning ~7 lines. Reason: pa-hardening targets short phrases and progressive layering; a single multi-clause bullet raises cognitive load and obscures the two distinct decisions (which vendor vs is-it-in-scope). Solution: Split the config-key resolution from the in-scope-signal detection into two short bullets/sub-lists so each carries one decision; no content need be lost.

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/coding/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better

📄 `instructions/r3/core/skills/debugging/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Dependency Management	5	⬆️ Slightly better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r3/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	⬆️ Slightly better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r3/core/skills/operation-manager/SKILL.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Dependency Management	Problem: Line 10 changes the frontmatter to `model: claude-sonnet-4-6, gpt-5.5, gemini-3.1-pro` — a comma-joined list of vendor model ids in a field the skill schema treats as a single model id, and these literal ids are baked into the prompt rather than parameterized. Reason: pa-rosetta requires agent-agnostic prompts and pa-hardening forbids hardcoded tool/vendor names; pinning three specific vendor model ids in frontmatter risks contract breakage and reduces portability, even though the intent (broaden model support) is sound. Solution: Confirm the skill-schema `model:` field accepts a list; if it expects a scalar, express multi-agent support another way (e.g., a documented capability note) rather than a comma list of hardcoded vendor model ids.

📊 Gates Comparison

Gate	Score	Comparison
Output Contract	5	✅ Much better
Conflict Resolution	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Example Grounding	5	✅ Much better
Dependency Management	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/requirements-authoring/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	4	⬆️ Slightly better
Instruction Ordering	4	⬆️ Slightly better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	4	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	4	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	4	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Self-Validation	5	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	✅ Much better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/requirements-authoring/references/authoring-catalogs.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/requirements-use/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Input Contract	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	4	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	✅ Much better
Dependency Management	5	⬆️ Slightly better
Rosetta	4	⬆️ Slightly better

📄 `instructions/r3/core/skills/requirements-use/references/gap-analysis-catalogs.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/reverse-engineering/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Conflict Resolution	4	⬆️ Slightly better
Decision Branching	5	✅ Much better
Workflow Completeness	4	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Example Grounding	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	5	✅ Much better
Self-Validation	4	⬆️ Slightly better
Bloat Control	5	✅ Much better
Cognitive Budget	5	⬆️ Slightly better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r3/core/skills/orchestrator-contract/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Single Responsibility	4	⬆️ Slightly better
Input Contract	4	⬆️ Slightly better
Success Criteria	5	✅ Much better
Conflict Resolution	5	✅ Much better
Decision Branching	5	✅ Much better
Workflow Completeness	5	✅ Much better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Safety Boundaries	5	✅ Much better
Failure Handling	5	✅ Much better
Epistemic Honesty	4	⬆️ Slightly better
Self-Validation	5	✅ Much better
Bloat Control	4	⬆️ Slightly better
Cognitive Budget	5	✅ Much better
Dependency Management	5	⬆️ Slightly better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r3/core/skills/orchestrator-contract/references/dispatch-template.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/scenarios-generation/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/scenarios-generation/references/gwt-spec.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/scenarios-generation/references/testrail-export.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/scenarios-generation/references/testrail-format.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/testing/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Goal Specification	5	⬆️ Slightly better
Input Contract	5	⬆️ Slightly better
Output Contract	4	⬆️ Slightly better
Success Criteria	5	⬆️ Slightly better
Decision Branching	5	⬆️ Slightly better
Workflow Completeness	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Reference Integrity	5	⬆️ Slightly better
Example Grounding	4	⬆️ Slightly better
Safety Boundaries	5	⬆️ Slightly better
Failure Handling	5	⬆️ Slightly better
Epistemic Honesty	5	⬆️ Slightly better
Self-Validation	5	⬆️ Slightly better
Rosetta	5	⬆️ Slightly better

📄 `instructions/r3/core/skills/testing/references/implementation-examples.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Reference Integrity	Problem: Load-bearing rules (ATC traceability, assertion-priority, exact UI/selector/API code shapes) now live ONLY in this lazy-loaded reference, and the aqa/qa phase files that consume the testing skill do not name implementation-examples.md directly (chain: phase -> testing SKILL.md -> reference). Reason: An agent that under-loads the lazy reference loses the exact output shape and the fragile-selector approval rule, producing ungrounded test code. Solution: Keep the load instruction imperative and restate the 1-2 truly load-bearing invariants (ATC traceability, no-silent-fragile-selector) as a short inline guard in testing/SKILL.md so they survive if the reference is not loaded.

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/discovery/SKILL.md`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/discovery/references/confluence-binding.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Dependency Management	Problem: MCP call names are hardcoded inline (`confluence_get_page`, `confluence_get_page_children`, `confluence_search`, and write-guard names `confluence_create_page`/`confluence_update_page`/`confluence_add_comment`) rather than parameterized, against the Rosetta agent-agnostic / no-hardcoded-tool-names principle. Reason: pa-hardening and pa-rosetta require coding-agent-agnostic prompts with no baked tool names; a binding that names exact MCP functions ties the skill to one specific MCP server implementation. Solution: Keep the literal names as illustrative call shapes (acceptable for a vendor binding) but frame them as the expected MCP capability per the resolved binding rather than as the only valid tool identifiers, so a differently-named Confluence MCP still resolves.
🔵 Medium	Structural Coherence	Problem: The file ends with a stray, unmatched closing tag `</content>` on its last line, with no opening `<content>` anywhere in the document. The file is otherwise plain markdown (it opens with `# Vendor binding: Confluence`), so this dangling XML tag is a copy/paste or template-extraction artifact. Reason: When this reference is lazy-loaded into agent context the literal `</content>` renders as visible junk text and can confuse XML-aware parsing of the surrounding skill, signalling a malformed asset. Solution: Delete the trailing `</content>` line so the markdown reference ends cleanly on its last validation bullet.

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/discovery/references/jira-binding.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Dependency Management	Problem: Exact MCP tool identifiers are baked in (`jira_get_issue`, `jira_search_fields`, and write-guard names `jira_create_issue`/`jira_update_issue`/`jira_transition_issue`/`jira_add_comment`) rather than parameterized capabilities, against the no-hardcoded-tool-names principle. Reason: pa-rosetta/pa-hardening require agent-agnostic prompts; hardcoded function names couple the binding to one MCP server's exact API. Solution: Present the names as the expected Jira MCP call shapes for the resolved binding rather than as the sole valid identifiers, so an alternately-named Jira MCP still maps.
🔵 Medium	Structural Coherence	Problem: The file ends with a stray, unmatched `</content>` closing tag on its last line; there is no opening `<content>` tag anywhere and the document is plain markdown opening with `# Vendor binding: Jira`. It is a leftover extraction/template artifact. Reason: The dangling tag is emitted verbatim into agent context when the binding is lazy-loaded, producing junk output and a malformed-asset signal. Solution: Remove the trailing `</content>` line so the file ends on its last Read-only validation bullet.

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/discovery/references/testrail-binding.md`

⚠️ Issues Found

Severity	Gate	Details
🔵 Medium	Dependency Management	Problem: MCP identifiers are hardcoded (`get_case`, `get_case_fields`, and write-guard names `update_case`/`add_case`/`delete_case`) rather than parameterized, against the agent-agnostic / no-hardcoded-tool-names principle. Reason: pa-rosetta/pa-hardening require coding-agent-agnostic prompts; exact tool names couple the binding to one MCP implementation. Solution: Frame the names as the expected TestRail MCP call shapes for the resolved binding rather than the only valid identifiers.
🔵 Medium	Structural Coherence	Problem: The file ends with a stray, unmatched `</content>` closing tag on its last line; no opening `<content>` exists and the document is plain markdown opening with `# Vendor binding: TestRail`. It is a leftover extraction/template artifact (identical defect to the Jira and Confluence bindings). Reason: The dangling tag is rendered verbatim when the binding is lazy-loaded into context and signals a malformed asset; the repeated occurrence across all three bindings confirms a systematic copy/extraction error. Solution: Delete the trailing `</content>` line so the file ends on its last Read-only validation bullet.

📊 Gates Comparison

Gate	Score	Comparison

📄 `instructions/r3/core/skills/requirements-authoring/assets/ra-requirement-unit.xml`

✅ No Issues Found

📊 Gates Comparison

Gate	Score	Comparison
Output Contract	5	⬆️ Slightly better
Precision & Explicitness	5	⬆️ Slightly better
Structural Coherence	5	⬆️ Slightly better
Rosetta	5	⬆️ Slightly better

sveto added 4 commits June 5, 2026 09:37

create fresh branch, without content that bleeded into the diff via m…

110dd1d

…erge-rebase

reorganizing scattered skills

5032b34

add reorganized skills/templates + manual-tests/

e2dad64

Merge remote-tracking branch 'origin/main' into qa-aqa-testgen-skills

880edfd

sveto marked this pull request as ready for review June 11, 2026 08:40

sveto requested review from ElizaVetaFomka, YevheniiaLementova, isolomatov-gd, kkhristenko51 and omaiesh as code owners June 11, 2026 08:41

This was referenced Jun 11, 2026

Cutout skills from flows #90

Closed

Improve aqa & testgen #89

Closed

github-actions Bot added the enhancement New feature or request label Jun 11, 2026

sveto changed the title ~~Qa aqa testgen skills~~ add qa-flow, consolidate QA/AQA/testgen skills, add manual tests Jun 11, 2026

qa/aqa/testgen: bugfix per github review

650ed97

bugfix per github review

2365ab6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add qa-flow, consolidate QA/AQA/testgen skills, add manual tests#110

add qa-flow, consolidate QA/AQA/testgen skills, add manual tests#110
sveto wants to merge 6 commits into
mainfrom
qa-aqa-testgen-skills

sveto commented Jun 11, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sveto commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 11, 2026

Rosetta Triage Review

Uh oh!

github-actions Bot commented Jun 11, 2026

📋 Prompt Quality Validation Report

❌ Validation Failed

Summary by File

Uh oh!

github-actions Bot commented Jun 11, 2026

📋 Prompt Quality Validation Report

❌ Validation Failed

Summary by File

📄 instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md

⚠️ Issues Found

📊 Gates Comparison

📄 instructions/r2/core/skills/coding/SKILL.md

✅ No Issues Found

📊 Gates Comparison

📄 instructions/r2/core/skills/debugging/SKILL.md

⚠️ Issues Found

📊 Gates Comparison

📄 instructions/r2/core/skills/orchestrator-contract/SKILL.md

⚠️ Issues Found

📊 Gates Comparison

📄 instructions/r2/core/skills/reverse-engineering/SKILL.md

⚠️ Issues Found

📊 Gates Comparison

📄 instructions/r2/core/skills/operation-manager/SKILL.md

⚠️ Issues Found

📊 Gates Comparison

📄 instructions/r2/core/skills/operation-manager/assets/om-schema.md

✅ No Issues Found

📊 Gates Comparison

📄 instructions/r2/core/skills/discovery/SKILL.md

⚠️ Issues Found

📊 Gates Comparison

📄 instructions/r2/core/skills/discovery/references/confluence-binding.md

✅ No Issues Found

📊 Gates Comparison

📄 instructions/r2/core/skills/discovery/references/jira-binding.md

✅ No Issues Found

📊 Gates Comparison

📄 instructions/r2/core/skills/discovery/references/testrail-binding.md

✅ No Issues Found

📊 Gates Comparison

📄 instructions/r2/core/skills/requirements-use/SKILL.md

⚠️ Issues Found

📊 Gates Comparison

📄 instructions/r2/core/skills/requirements-use/references/gap-analysis-catalogs.md

✅ No Issues Found

📊 Gates Comparison

📄 instructions/r2/core/skills/requirements-authoring/SKILL.md

⚠️ Issues Found

📊 Gates Comparison

📄 instructions/r2/core/skills/requirements-authoring/references/authoring-catalogs.md

⚠️ Issues Found

📊 Gates Comparison

📄 instructions/r2/core/skills/requirements-authoring/assets/ra-requirement-unit.xml

⚠️ Issues Found

📊 Gates Comparison

📄 instructions/r2/core/skills/scenarios-generation/SKILL.md

⚠️ Issues Found

📊 Gates Comparison

📄 instructions/r2/core/skills/scenarios-generation/references/gwt-spec.md

✅ No Issues Found

📊 Gates Comparison

📄 instructions/r2/core/skills/scenarios-generation/references/testrail-export.md

⚠️ Issues Found

📊 Gates Comparison

📄 instructions/r2/core/skills/scenarios-generation/references/testrail-format.md

✅ No Issues Found

📊 Gates Comparison

📄 instructions/r2/core/skills/testing/SKILL.md

⚠️ Issues Found

📊 Gates Comparison

📄 instructions/r2/core/skills/testing/references/implementation-examples.md

sveto commented Jun 11, 2026 •

edited

Loading

📄 `instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md`

📄 `instructions/r2/core/skills/coding/SKILL.md`

📄 `instructions/r2/core/skills/debugging/SKILL.md`

📄 `instructions/r2/core/skills/orchestrator-contract/SKILL.md`

📄 `instructions/r2/core/skills/reverse-engineering/SKILL.md`

📄 `instructions/r2/core/skills/operation-manager/SKILL.md`

📄 `instructions/r2/core/skills/operation-manager/assets/om-schema.md`

📄 `instructions/r2/core/skills/discovery/SKILL.md`

📄 `instructions/r2/core/skills/discovery/references/confluence-binding.md`

📄 `instructions/r2/core/skills/discovery/references/jira-binding.md`

📄 `instructions/r2/core/skills/discovery/references/testrail-binding.md`

📄 `instructions/r2/core/skills/requirements-use/SKILL.md`

📄 `instructions/r2/core/skills/requirements-use/references/gap-analysis-catalogs.md`

📄 `instructions/r2/core/skills/requirements-authoring/SKILL.md`

📄 `instructions/r2/core/skills/requirements-authoring/references/authoring-catalogs.md`

📄 `instructions/r2/core/skills/requirements-authoring/assets/ra-requirement-unit.xml`

📄 `instructions/r2/core/skills/scenarios-generation/SKILL.md`

📄 `instructions/r2/core/skills/scenarios-generation/references/gwt-spec.md`

📄 `instructions/r2/core/skills/scenarios-generation/references/testrail-export.md`

📄 `instructions/r2/core/skills/scenarios-generation/references/testrail-format.md`

📄 `instructions/r2/core/skills/testing/SKILL.md`

📄 `instructions/r2/core/skills/testing/references/implementation-examples.md`

📄 `instructions/r2/core/workflows/aqa-flow.md`

📄 `instructions/r2/core/workflows/aqa-flow-code-analysis.md`

📄 `instructions/r2/core/workflows/aqa-flow-data-collection.md`

📄 `instructions/r2/core/workflows/aqa-flow-requirements-clarification.md`

📄 `instructions/r2/core/workflows/aqa-flow-selector-identification.md`

📄 `instructions/r2/core/workflows/aqa-flow-selector-implementation.md`

📄 `instructions/r2/core/workflows/aqa-flow-test-correction.md`

📄 `instructions/r2/core/workflows/aqa-flow-test-implementation.md`

📄 `instructions/r2/core/workflows/aqa-flow-test-report-analysis.md`

📄 `instructions/r2/core/workflows/qa-flow.md`

📄 `instructions/r2/core/workflows/qa-flow-api-spec-analysis.md`

📄 `instructions/r2/core/workflows/qa-flow-data-collection.md`

📄 `instructions/r2/core/workflows/qa-flow-documentation-mcp-subflow.md`

📄 `instructions/r2/core/workflows/qa-flow-project-config-loading.md`

📄 `instructions/r2/core/workflows/qa-flow-gap-and-requirements-clarification.md`

📄 `instructions/r2/core/workflows/qa-flow-test-case-specification.md`

📄 `instructions/r2/core/workflows/qa-flow-test-implementation.md`

📄 `instructions/r2/core/workflows/qa-flow-execution-and-report-analysis.md`

📄 `instructions/r2/core/workflows/qa-flow-test-correction.md`

📄 `instructions/r2/core/workflows/testgen-flow.md`

📄 `instructions/r2/core/workflows/testgen-flow-project-config-loading.md`