add qa-flow, consolidate QA/AQA/testgen skills, add manual tests#110
add qa-flow, consolidate QA/AQA/testgen skills, add manual tests#110sveto wants to merge 6 commits into
Conversation
Rosetta Triage ReviewSummary: This PR consolidates and significantly refactors the QA, AQA, and Testgen AI agent workflows — introducing a brand-new end-to-end
Findings: [CRITICAL] [HIGH] Multiple files — Frontmatter description exceeds 30-token cap
[HIGH] [HIGH] [HIGH] [MEDIUM] DRY — Duplicated validation rules in [MEDIUM] [MEDIUM] [POSITIVE] CI improvement in [POSITIVE] Manual test docs Suggestions:
Automated triage by Rosetta agent |
📋 Prompt Quality Validation Report❌ Validation FailedSummary by File
📋 Full per-file findings (Problem / Reason / Solution + Gates Comparison) → Workflow run Summary (PR comments are capped at 65,536 chars; details live on the Actions run). |
📋 Prompt Quality Validation Report❌ Validation FailedSummary by File
📄
|
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Example Grounding | Problem: The added <audit_survival_checks> block is dense pointer prose (e.g. 'Dual structure → the phase asserts the contract, the skill emits it', 'Vendors: config-key precedence, not literal tags') with no concrete example of a pass-vs-fail case for any check. These checks are abstract and the reviewer applying them has nothing to pattern-match against.Reason: Abstract review heuristics without grounding get applied inconsistently across auditors. Solution: Add one short concrete pass/fail example for the most error-prone checks (e.g. an N-sections mismatch, or a literal-tag vendor reference), or point to an existing worked example elsewhere in the references. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Single Responsibility | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/coding/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/debugging/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Reference Integrity | Problem: The added <when_to_use_skill> line points the reader to the triage mode 'in <process>', but the new section is named <test_execution_triage> and the skill has no <process> block. The mental hook lands on a section that does not exist.Reason: A pointer to a non-existent tag makes the agent search for content it cannot find, weakening reliable routing into the new mode. Solution: Change 'use the test-execution triage mode in <process>' to point to <test_execution_triage> (the actual section tag). |
| 🔵 Medium | Single Responsibility | Problem: The added <test_execution_triage> block introduces a second, distinct responsibility — read-only triage of an automated-test execution report with its own taxonomy, capture analysis, and cross-failure pattern detection — onto a skill whose core job is root-cause debugging of a single issue. The <when_to_use_skill> and frontmatter now advertise two jobs.Reason: Two responsibilities in one skill raises the chance an agent applies the wrong mode or loads triage machinery when only simple debugging is needed. Solution: Acceptable as a bounded mode if kept thin; if the triage mode grows further it should move to its own skill. For now ensure the triage mode stays a pointer-style specialization (it correctly references <core_concepts> for evidence labels and redaction) and does not accumulate independent process depth. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/orchestrator-contract/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Rosetta | Problem: The added <prerequisites> bullet 'OPERATION_MANAGER is active' and the dispatch-template line 'MUST USE SKILL subagent-contract, operation-manager' bind this skill to operation-manager, which is not in the canonical docs/definitions/skills.md (only plan-manager is). pa-rosetta requires using names from docs/definitions/*.md and not auto-adding out-of-list items.Reason: Referencing an out-of-list skill as a hard prerequisite makes the contract depend on a name the KB does not officially recognize. Solution: Add operation-manager to the canonical skills definitions (or reference the already-canonical plan-manager), so the binding resolves against the canonical list. |
| 🔵 Medium | Single Responsibility | Problem: The added items 22–28 fold a full phase-by-phase workflow drive-loop (just-in-time ACQUIRE, state-file updates, phase-skip confirmation, downstream prerequisite verification) into the orchestrator-contract skill, which previously owned only delegation/dispatch/review. The skill now carries both the delegation contract and the multi-phase execution loop. Reason: Mixing the dispatch contract with the phase-execution engine increases the skill's responsibility count and the chance of the two concerns drifting. Solution: Keep the drive-loop here only if it stays pointer-thin; otherwise the phase-chaining loop belongs with the (missing) load-workflow authority it keeps deferring to. At minimum resolve the load-workflow reference so ownership of loading vs driving is unambiguous. |
| 🔵 Medium | Reference Integrity | Problem: The diff adds three references to a load-workflow skill as a canonical authority (core_concepts: 'WORKFLOW LOADING is a separate canonical concern owned by load-workflow'; process #28; resources block). No load-workflow skill exists in r2 (no skill folder) and it is absent from the canonical docs/definitions/skills.md list. The skill delegates its entire workflow-loading concern to a target that cannot be acquired.Reason: An agent told that loading is owned by load-workflow will try to use a skill that does not resolve, breaking the delegation chain at the top of every multi-phase run.Solution: Either add load-workflow to the canonical skills list and create the skill, or point these references to the actual loading authority (e.g. the bootstrap prep steps / load-context) that exists in r2. Do not leave a dangling canonical reference. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/reverse-engineering/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Single Responsibility | Problem: The added <analysis_modes> block introduces two concrete, domain-specific modes (test-automation architecture analysis, API-contract extraction) into a skill whose general purpose is code→spec reverse engineering. These modes carry their own GATEs, source-priority lists, and per-endpoint templates, broadening the skill beyond its single distillation responsibility.Reason: Each added concrete mode widens the skill's job count and the surface an agent must scan to apply the right one. Solution: Acceptable while the modes stay thin specializations that EMIT into the phase-owned artifact (they currently do, and defer artifact shape/path to the phase). Watch for further mode accretion; if more modes are added, extract them to a dedicated test/API-analysis skill. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/operation-manager/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟠 Very High | Reference Integrity | Problem: core_concepts and resources reference ACQUIRE todo-tasks-fallback.md FROM KB as the built-in fallback when MCP and CLI both fail. No todo-tasks-fallback.md exists under instructions/r2 (it exists only in r3) and it is not in the canonical r2 docs/definitions/rules.md. The whole fallback path is unreachable in r2.Reason: When the CLI fails the agent is told to ACQUIRE a file that does not resolve in r2, leaving it with no working fallback at the moment it most needs one. Solution: Add todo-tasks-fallback.md to r2 (and to r2 rules definitions) or point the fallback at an existing r2 rule/asset. Do not ship a fallback whose target cannot be acquired in this release. |
| 🟡 High | Precision & Explicitness | Problem: core_concepts states the CLI as npx rosettify@latest <command> <subcommand> <plan_file> with the <command> slot left as a placeholder, but every concrete invocation in <process> and <validation_checklist> uses the literal command plan (e.g. plan next, plan update_status, plan query). The generic <command> placeholder is never bound to plan, so the one term for the command concept is presented two ways.Reason: A placeholder command that is never bound forces the agent to infer the command name, risking malformed invocations. Solution: State the command literally once ( npx rosettify@latest plan <subcommand> <plan_file>) as plan-manager does, or explicitly define that <command> is always plan for this skill. |
| 🟡 High | Rosetta | Problem: This new skill operation-manager duplicates the existing canonical plan-manager skill (identical role, identical description, near-identical core_concepts and process) but is not in docs/definitions/skills.md (only plan-manager is listed). Two near-identical plan-management skills coexist in r2 and orchestrator-contract now hard-binds to the out-of-list one. pa-rosetta forbids auto-adding out-of-list items and DRY forbids the duplication.Reason: Two competing skills with the same job split callers and guarantee drift; an out-of-list skill is not recognized by the canonical KB. Solution: Decide one canonical plan/operation manager: either add operation-manager to the canonical skills list and deprecate/remove plan-manager, or fold the new CLI/template changes back into plan-manager. Do not keep both. |
| 🔵 Medium | Reference Integrity | Problem: Resources lists USE FLOW adhoc-flow``. A skill pointing outward to a workflow is reverse/sibling awareness — per pa-hardening boundaries a skill should not know which flow runs it. plan-manager does not carry this pointer.Reason: Skill→workflow awareness violates the prompt-boundary contract and couples the skill to a specific flow. Solution: Remove the USE FLOW adhoc-flow line from the skill; flow selection is the orchestrator/bootstrap concern, not the skill's. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Reference Integrity | 2 | ⬇️ Slightly worse |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/operation-manager/assets/om-schema.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/discovery/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Rosetta | Problem: The frontmatter description (lines 3) is two full sentences (~55 tokens): 'Rosetta skill to gather source artifacts ... Use to collect issues/tickets, test cases, and documentation pages for downstream requirements, test design, or debugging phases.' pa-hardening requires frontmatter description be a call-to-action and extremely dense (<30 tokens).Reason: Frontmatter is loaded into every agent's context for skill selection; an over-long description wastes the always-resident budget and violates the Rosetta frontmatter density rule. Solution: Compress the description to a single dense call-to-action under 30 tokens, e.g. 'Collect + normalize + redact source-of-record artifacts (Jira/Confluence/TestRail via MCP) into the phase-defined raw-context artifact.' |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/skills/discovery/references/confluence-binding.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/discovery/references/jira-binding.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/discovery/references/testrail-binding.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/requirements-use/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Goal Specification | Problem: The PR adds a whole second responsibility, the <gap_analysis> analysis-only mode (multi-source contradiction/gap/ambiguity classification over Jira/Confluence/TestRail/API-spec/test-plan data), but neither the frontmatter description ("Consume approved requirements to drive planning, implementation, and validation...") nor <when_to_use_skill> ("implementing from approved requirements, planning work from requirement IDs, or auditing requirement-to-delivery traceability") mentions gap analysis. Skill routing is driven by description/when-to-use, so the new mode is undiscoverable by the dispatching agent.Reason: An entry mode that the trigger metadata never names will not be loaded for the cases it exists to serve. Solution: Add the gap_analysis mode to <when_to_use_skill> and to the frontmatter description (e.g. add a clause about analyzing collected multi-source data for gaps/contradictions/ambiguities) so the mode is selectable. |
| 🟡 High | Single Responsibility | Problem: The PR bolts a whole new <gap_analysis> analysis-only mode (lines 93-105) onto a skill whose stated job (frontmatter + <role> line 24: 'using requirements as execution contract') is consuming approved requirements to drive planning/implementation/validation. Multi-source contradiction/gap/ambiguity detection across Jira/Confluence/TestRail/API-spec/test-plans is a distinct responsibility from requirement-to-delivery traceability, pushing the skill from 1-2 jobs toward 3.Reason: A skill description that advertises only requirement usage but silently contains a second analysis mode harms skill selection and violates the SRP/single-responsibility expectation; agents may load it for the wrong job or miss the mode entirely. Solution: Either keep the two responsibilities deliberately fused and state the dual-mode scope explicitly in the frontmatter/ <role>/<when_to_use_skill> so callers understand both modes, or extract <gap_analysis> into its own analysis skill that the phase invokes separately. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 3 | ⬇️ Slightly worse |
| Single Responsibility | 3 | ⬇️ Slightly worse |
| Output Contract | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/requirements-use/references/gap-analysis-catalogs.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/skills/requirements-authoring/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Single Responsibility | Problem: The PR adds a <synthesis> mode (synthesize multi-source Jira/Confluence/TestRail/answers/gap-analysis data into one structured requirements document) on top of the existing authoring/updating/reviewing responsibility. The frontmatter description was NOT updated to name synthesis (it still reads only "Author, update, and validate functional and non-functional requirements..."), though <when_to_use_skill> was updated to add "synthesizing". This is a milder version of the requirements-use split, and synthesis shares the authoring rules, so it is more cohesive — but description/when-to-use are now inconsistent about the mode set.Reason: Frontmatter drives skill selection; if it omits a mode that when-to-use and the body define, the mode may not be routed. Solution: Add synthesis to the frontmatter description so trigger metadata matches the <when_to_use_skill> and the <synthesis> body. |
| 🔵 Medium | Workflow Completeness | Problem: Compression collapsed the explicit base <authoring_flow> (15 ordered bullets including 'Check against current best practices' and 'Once drafting is done proactively seek user approval') into a 3-step flow (lines 85-91). The proactive 'check against current best practices' step is no longer stated in SKILL.md or the catalogs.Reason: Dropping an explicit ordered step from a multi-step authoring flow risks the agent skipping the best-practices validation that the base flow enforced. Solution: Confirm the best-practices-check step is intentionally dropped or fold it back into step 2 of <authoring_flow> (e.g. 'run quality-gate + best-practices checks'). |
| 🔵 Medium | Precision & Explicitness | Problem: The base NFR rule 'Update existing requirements with new schema' (base <nonfunctional_requirements>) was dropped in the compression and is not recoverable in NEW SKILL.md (line 78 NFR clause) nor in references/authoring-catalogs.md (NFR schema section, lines 114-125). The instruction to migrate already-authored requirements onto the current schema is lost.Reason: Without the migrate-to-current-schema directive, the agent may leave legacy units on an outdated schema during updates, producing an inconsistent requirements set. Solution: Re-add a short clause (in <requirement_statements> NFR bullet or the catalogs schema-fields section) stating that existing requirement units must be updated to the current schema when re-authored. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/skills/requirements-authoring/references/authoring-catalogs.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Rosetta | Problem: Lines 51 and 49-55 restate the requirement schema fields and ID conventions, and the unit template (lines 9-28) duplicates the verbatim <req> template that also lives in the asset ra-requirement-unit.xml. The brief says SKILL.md owns rules/methods and this file holds reference catalogs, but the <req> template is now duplicated across this reference AND the asset, violating DRY/SSoT within the family.Reason: Duplicated canonical templates drift apart (already visible vs the XML asset), the exact failure mode the single-source convention prevents. Solution: Keep the verbatim <req> template in exactly one location (the asset) and have this catalog reference it, rather than copying the full template block. |
| 🔵 Medium | Precision & Explicitness | Problem: The <req> unit template here (lines 9-28) uses the OLD two-field shape: `NotStarted |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/requirements-authoring/assets/ra-requirement-unit.xml
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Conflict Resolution | Problem: After this change the asset (`[Implemented |
| 🟡 High | Precision & Explicitness | Problem: The new token set `[Implemented |
| 🟡 High | Output Contract | Problem: The change collapsed two structured fields into one freeform field: BASE had `NotStarted |
| 🔵 Medium | Rosetta | Problem: This asset's <implementation> line (line 38) now diverges from the same template restated in the sibling authoring-catalogs.md (lines 25-26 of that file, still the old two-field shape). Rosetta DRY/SSoT within a family requires one canonical definition; the PR changed this copy without updating the sibling, creating an intra-family contradiction rather than a single source of truth.Reason: Same-family files teaching different field shapes break the single-source-of-truth discipline Rosetta enforces. Solution: Make exactly one file canonical for the <req> implementation field and have the other reference it; update both together so the family stays consistent. |
| 🔵 Medium | Example Grounding | Problem: The new inline guidance [Additional Notes: files affected for implemented, notes without duplication for what changed for todo and modify] is denser and less concrete than the BASE [CONCISE: Implemented: aggregated files affected, NotStarted/Planned/ToBeRemoved: nothing, ToBeModified: what was originally documented but now dropped], which spelled out the expected note content per state. The per-state mapping of what to write is now partially lost.Reason: Losing the per-state note guidance reduces fill-in reliability for the implementation field. Solution: Restore an explicit per-state note expectation (what to write for each token) so the template self-documents the notes field. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Output Contract | 3 | ⬇️ Slightly worse |
| Conflict Resolution | 2 | ⬇️ Slightly worse |
| Precision & Explicitness | 3 | ⬇️ Slightly worse |
| Example Grounding | 3 | ⬇️ Slightly worse |
| Bloat Control | 4 | ⬆️ Slightly better |
| Rosetta | 3 | ⬇️ Slightly worse |
📄 instructions/r2/core/skills/scenarios-generation/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Single Responsibility | Problem: The new skill carries three distinct authoring modes — gwt_spec (Given-When-Then API specs), generation (TMS-format cases), and a vendor_binding resolver — plus a shared validation checklist. That is broad for one skill; the gwt_spec mode and the TMS generation mode are quite different artifact shapes. Reason: Two related but materially different output artifacts (ATC GWT specs vs TMS Steps/Expected cases) increase the cognitive search space within a single resident prompt. Solution: Acceptable as one skill since all three are 'design test scenarios from requirements'; keep but ensure the mode boundary in <gwt_spec> vs stays crisp so an agent never blends an ATC spec with a TMS case template. No split required. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/scenarios-generation/references/gwt-spec.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/scenarios-generation/references/testrail-export.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Rosetta | Problem: Rosetta prompts are coding-agent-agnostic and avoid hardcoded tool names; this binding embeds concrete mcp_testrail_* tool signatures (steps 1, 7, 8) as the operational contract. While a vendor binding legitimately names the vendor, the per-call MCP symbol form is a specific tool-name assumption rather than a 'tell how to think' resolution from project config.Reason: Hardcoded tool symbols reduce agent-agnostic portability, which the Rosetta gate guards against. Solution: Frame the MCP signatures as the shape to invoke against the project-resolved TestRail MCP tool names, consistent with the SKILL's stance that the skill never reads config and the phase resolves the binding. |
| 🔵 Medium | Dependency Management | Problem: MCP tool names are hardcoded throughout the process steps ( mcp_testrail_get_project, mcp_testrail_get_cases, mcp_testrail_add_case). This is a vendor-specific export binding, so TestRail names are expected here, but the tool-call invocations assume one MCP server naming scheme; a TestRail MCP exposed under a different tool prefix in the target project would not match. The file does parameterize the vendor concept abstractly (the swap table at the end), but the live process steps bake the exact mcp_testrail_* symbols.Reason: Hardcoded MCP symbol names can silently mismatch a differently-named TestRail MCP, breaking the export at call time. Solution: State once near the top that the TestRail MCP tool symbols are placeholders for whatever the project's TestRail MCP actually exposes (resolved from config), so an agent maps them rather than expecting those literal names. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/scenarios-generation/references/testrail-format.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/testing/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Single Responsibility | Problem: The added <implementation_modes> block (lines 61-84) bolts three substantial new responsibilities onto the testing skill: UI impl mode, API impl mode, and a two-part Selector mode (identify + page-object authoring). The BASE skill was a focused 'write thorough isolated tests' skill; the PR now also makes it the owner of page-object/selector identification and API-spec-to-test implementation. That pushes the skill from 1-2 responsibilities toward four distinct workflows.Reason: A skill carrying four loosely-related modes is harder to load lazily and dilutes the single-responsibility contract the schema favors. Solution: Confirm these impl modes belong in testing rather than a dedicated implementation/selector skill; if kept, scope the frontmatter/role so the added responsibilities are declared, or split selector identification into its own skill. |
| 🔵 Medium | Conflict Resolution | Problem: The new modes assert the PHASE is SSoT for paths/taxonomy/contract/read-write boundary/iteration cap, while the resident Quality bar still hardcodes absolutes (>=80% coverage, 1s timeout, mock-external-only). When a phase supplies a different coverage/assertion taxonomy, it is not stated which wins — the canonical quality bar or the phase binding. Reason: Two SSoT claims (phase bindings vs canonical quality bar) touch overlapping territory (coverage, assertion taxonomy) without an explicit tiebreak, risking inconsistent agent behavior. Solution: Add one line stating precedence: phase bindings govern paths/taxonomy/output contract; the canonical Quality bar and Mocking policy remain non-negotiable unless the phase explicitly overrides a named item. Resolve the implicit overlap between 'PHASE is SSoT' and 'rules below are canonical'. |
| 🔵 Medium | Rosetta | Problem: The added General method line embeds an imperative skill-to-skill call inside the skill body: match the repository's existing patterns (USE SKILL coding standards-first mode ...). Per pa-hardening the skills-can't-call-skills boundary discourages a skill imperatively invoking a sibling skill from within its procedure; references to sibling skills are normally surfaced as recommendations, not inline USE SKILL directives in the method steps.Reason: An imperative USE SKILL inside the skill body couples this skill to a sibling skill's internal mode name ('standards-first mode'), which is sibling-internals awareness the boundary rule warns against. Solution: Demote the inline USE SKILL coding to a reference/recommendation (the section already lists skill coding — standards-first mode), or phrase as 'apply repo conventions per the coding standards-first guidance' without an imperative invoke inside the mode procedure, preserving the no-sibling-call boundary. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Single Responsibility | 3 | ⬇️ Slightly worse |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/testing/references/implementation-examples.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Dependency Management | Problem: The new file bakes in specific framework/tool names (pytest, Jest, JUnit 5 + RestAssured, MUI class names) as full code blocks rather than parameterized shapes. Reason: pa-rosetta requires coding-agent-agnostic prompts, but a skill asset of worked examples legitimately shows concrete language samples since the calling phase owns the real binding and the file labels them non-authoritative. Solution: This is acceptable as-is because line 3 and line 113 explicitly frame them as 'shape references only' that the agent must 'adapt to the project's existing patterns'. No change required for behavior; if tightening, keep the per-language examples but reinforce the agnostic disclaimer once near the top. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/aqa-flow-code-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/aqa-flow-data-collection.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Rosetta | Problem: The frontmatter description (line 3) still reads 'Data Collection from TestRail and Confluence' — naming the two vendors as fixed — while the rewritten body (lines 19-22) makes vendors config-resolved and explicitly NOT hardcoded. Reason: pa-rosetta requires coding-agent/vendor-agnostic prompts and frontmatter as a dense call-to-action; the description contradicts the body's own non-hardcoding rule, a minor consistency defect. Solution: Align the description with the new config-resolved model, e.g. 'Phase 1 of AQA workflow - collect test-case + feature context via configured TMS/documentation vendors', to remove the hardcoded-vendor mismatch. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow-requirements-clarification.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Rosetta | Problem: The new frontmatter description is oversized and packs implementation detail: 'Phase 2 of AQA workflow - Requirements Clarification (gap-filling questioning) and Assertion Transcription (derives typed assertions via the requirements-use gap_analysis mode and writes them to the test plan as a mandatory list) - USER INTERACTION REQUIRED'. This is ~55-60 tokens, well over the pa-hardening <30-token call-to-action target, and leaks internal mechanics (skill name, mode name, step references) into the discovery surface. Reason: pa-hardening mandates frontmatter description be a small dense call-to-action (<30 tokens); the discovery shell should not carry per-step implementation internals. Solution: Shorten the description to a dense call-to-action under 30 tokens, e.g. 'Phase 2 of AQA - clarify requirements and define explicit typed assertions (USER INTERACTION REQUIRED)'. Move the gap_analysis/requirements-use mechanics into the body (already present in <workflow_context>). |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow-selector-identification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow-selector-implementation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Rosetta | Problem: The NEW file ends with a stray, unbalanced closing tag </output> at EOF (after </aqa_flow_selector_implementation>), with no matching <output> opener anywhere in the file. This is a schema-impurity / well-formedness defect introduced by the PR (verified: count 0, count 1).Reason: pa-hardening/pa-schemas require schema-pure, well-formed phase artifacts; an unbalanced tag breaks XML-tag integrity and can confuse downstream parsing of the phase body. Solution: Delete the trailing </output> line at the end of the file so the phase document closes cleanly on </aqa_flow_selector_implementation>. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow-test-correction.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Rosetta | Problem: The PR adds a stray literal </output> tag as the final line of the file (diff line +</output> at the end), outside the root <aqa_flow_test_correction> element. This is leaked tool/harness scaffolding, not prompt content. Per pa-schemas/pa-hardening the artifact must be schema-pure and source-agnostic; a dangling unmatched closing tag pollutes the phase body and is the kind of AI-slop artifact the authoring skill explicitly forbids.Reason: An unbalanced XML-like tag in the published phase confuses agents parsing the prompt block and signals tooling contamination. Solution: Delete the trailing </output> line so the file ends cleanly with </aqa_flow_test_correction>. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow-test-implementation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Rosetta | Problem: The PR appends a stray literal </output> line at EOF (diff line +</output> after </aqa_flow_test_implementation>), outside the root element. This is harness/tool scaffolding leaked into a published phase, violating schema-purity and the source-agnostic, state-only requirement in pa-hardening/pa-schemas.Reason: An unbalanced closing tag is tooling contamination and can mislead agents parsing the phase body. Solution: Delete the trailing </output> so the file ends with </aqa_flow_test_implementation>. |
| ⚪ Low | Rosetta | Problem: The rewritten phase bakes the HITL refusal gate inline ( <stop_for_execution> step 6.3: 'User instruction to bypass this gate must be refused with citation of this rule...'). pa-hardening states user involvement and HITL should be governed by the hitl skill, not restated as bespoke refusal logic inside each phase body.Reason: Duplicated, per-phase HITL refusal logic drifts from the single canonical gate authority and is the boundary pa-hardening warns against. Solution: Route the stop/refuse behavior through the hitl skill (as the parent aqa-flow.md already does via type="HITL"); keep only the phase-specific binding (what to wait for) and reference the hitl gate authority instead of re-implementing refusal wording per phase. |
| ⚪ Low | Output Contract | Problem:<workflow_context> calls the appended record ## Test Implementation, while <implementation_handoff_contract> lists it as the 'Test Implementation record' with five ### subsections; the exact top-level heading string an agent must write is implied rather than stated once.Reason: Cosmetic; the record content and subsections are fully specified, so the artifact is still produceable. Solution: State the canonical record heading once (e.g. ## Test Implementation) and reference it from the contract and checklist by that exact string. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow-test-report-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/qa-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Reference Integrity | Problem: Phase 4 skill map lists scenarios-generation (- Phase 4: scenarios-generation, coding.) as a skill tag. This logical name is not an established skill in the AQA/QA family references and is not defined in-prompt; per pa-rosetta, Rosetta skills must come from the canonical docs/definitions/skills.md list.Reason: A non-canonical skill tag will fail to resolve via ACQUIRE/USE SKILL at Phase 4, breaking the test-specification step. Solution: Verify scenarios-generation against docs/definitions/skills.md; if it is not a canonical skill name, replace it with the correct existing skill (e.g. testing/tech-specs) or add it to the canonical list before referencing. |
| 🔵 Medium | Rosetta | Problem: Frontmatter description is a multi-sentence paragraph (~80 tokens: full source-system enumeration, framework list, and end-to-end pipeline recap). pa-hardening requires the frontmatter description be a call-to-action that is extremely small and dense (<30 tokens).Reason: The description field is the routing/selection signal loaded into every agent context; an oversized paragraph wastes the cached-token budget and dilutes the matching cue. Solution: Compress description to a single dense call-to-action trigger (e.g. 'MUST apply for backend API test automation: spec analysis → implementation → execution → corrections.'); move the tool/source enumeration into the body where it is already restated. |
| 🔵 Medium | Rosetta | Problem:<references> lists a Phase 4 skill scenarios-generation and the parent maps phases to files like qa-flow-test-case-specification.md, qa-flow-gap-and-requirements-clarification.md, and qa-flow-execution-and-report-analysis.md. Per pa-rosetta, Rosetta prompts must reference only canonical names from docs/definitions/*.md; whether scenarios-generation and these phase files are canonical cannot be confirmed from the workflow alone.Reason: Non-canonical logical names cause zero-document ACQUIREs at runtime; this is a name-hygiene risk, not a structural break, so medium severity. Solution: Verify each referenced skill and phase-file name against docs/definitions/skills.md and docs/definitions/workflows.md; align names or add the missing canonical entries. |
| 🔵 Medium | Bloat Control | Problem: The frontmatter description is a long multi-sentence enumeration (TestRail/Jira, pytest/Jest/JUnit/RestAssured/SuperTest, plus a full second paragraph restating sources/tools). pa-hardening requires the frontmatter description to be a dense call-to-action under ~30 tokens; this one is roughly 90+ tokens and duplicates the <description_and_purpose> body (which itself defers to the frontmatter).Reason: Over-long frontmatter inflates every routing decision's token cost and violates the <30-token description rule; behavior is unaffected so this is medium severity. Solution: Trim the description to a short call-to-action (e.g. 'MUST apply for backend API test-automation tasks: write/extend/debug API tests from test cases and specs.'); keep the tool/source enumeration in the body, not the frontmatter. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/qa-flow-api-spec-analysis.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Cognitive Budget | Problem: At ~14.4K chars this single phase file carries a full endpoint-contract template, a complete worked example, a redaction catalog with a grep re-scan list, an Analysis Summary block, and a validation checklist all inline. It is the largest of the seven files and exceeds the pa-hardening size guidance (300-500 lines ideal; split when larger), concentrating multiple heavy sub-contracts into one phase load. Reason: The whole template + example + redaction catalog is resent in context every turn the phase runs; progressive disclosure of the large static catalog reduces the per-turn cognitive/token load without losing the contract. Solution: Move the verbatim <endpoint_contract_template> worked example and/or the <redaction_contract> catalog to a referenced asset the skill ACQUIREs on demand, keeping the phase file to the section list + binding + checklist (progressive disclosure). |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/qa-flow-data-collection.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Rosetta | Problem: This phase ( baseSchema: docs/schemas/phase.md) ACQUIREs and executes a sibling phase: step 1.2b.2 ACQUIRE qa-flow-documentation-mcp-subflow.md FROM KB and 1.2b.4 'execute all numbered steps inside <execute_documentation_mcp>'. The target file is also a phase (baseSchema: docs/schemas/phase.md), and the parent qa-flow.md does not reference it. pa-hardening boundary: phases cannot call phases; only the parent workflow composes phases.Reason: Phase-calls-phase violates the Rosetta composition boundary; a sibling phase invoking another phase breaks progressive-disclosure ownership and creates a hidden control-flow dependency the orchestrator/workflow does not see. Solution: Either (a) promote the documentation-MCP collection into a step owned by qa-flow.md (the workflow), or (b) make the MCP-collection content a skill asset/reference ACQUIRE'd by the discovery skill rather than a sibling phase file, so data-collection no longer invokes another phase. |
| 🔵 Medium | Reference Integrity | Problem:<raw_data_contract> Backend Source Code Analysis references RefSrc/ docs, but the canonical Rosetta term is refsrc/ (lowercase, per pa-rosetta target-folder list). The same file uses refsrc/{project-name}/docs/ correctly elsewhere (step 2.1 in the sibling api-spec file), so the capitalization is inconsistent within the family.Reason: Inconsistent casing of a path/term reference can cause an agent to look up a non-existent directory; one term per concept is required. Solution: Change RefSrc/ to the canonical refsrc term to match the pa-rosetta folder reference and the rest of the qa-flow family. |
| ⚪ Low | Reference Integrity | Problem: The phase references vendor binding files loaded inside discovery (references/<vendor>-binding.md, references/confluence-binding.md) and the subflow tag qa-flow-documentation-mcp-subflow. These resolve only if those binding files and the subflow file exist in the KB; they are valid in-family references but unverifiable from this file alone.Reason: In-family references are valid by design; this is a low-severity reminder to confirm the targets ship together. Solution: Ensure references/testrail-binding.md, references/jira-binding.md, references/confluence-binding.md exist under the discovery skill and the subflow file is published. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/qa-flow-documentation-mcp-subflow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Rosetta | Problem: This file is authored as a phase ( baseSchema: docs/schemas/phase.md) but functions as a sub-phase that the sibling phase qa-flow-data-collection.md ACQUIREs and runs (its <execute_documentation_mcp> steps are executed by the parent phase). It carries step="1.2b", i.e. it is a numbered step of another phase, not a standalone phase, and the workflow qa-flow.md does not list it. This is the phase-cannot-call-phase boundary from the other side.Reason: Being a phase-schema file invoked by a sibling phase violates the Rosetta composition boundary and gives it sibling/reverse awareness (it names its parent phase qa-flow-data-collection), which phases must not have.Solution: Re-home this fragment as a discovery skill asset/reference (so the collecting phase calls the skill, not a sibling phase), or fold its steps directly into qa-flow.md/the data-collection step under the parent workflow. If kept separate, it must not use the phase schema while being invoked by another phase. |
| ⚪ Low | Reference Integrity | Problem: Relies on discovery loading references/confluence-binding.md and on qa-project-config.md config keys; these resolve only when those files ship in the KB / target structure.Reason: In-family/target-structure references are valid by design; low-severity confirmation only. Solution: Confirm references/confluence-binding.md exists under the discovery skill and that the documented config keys match the qa-project-config template. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/qa-flow-project-config-loading.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: The phase file is large and heavily redundant about the project-config-is-project-wide-not-per-IDENTIFIER point. It is restated in <description_and_purpose>, twice in <workflow_context> Output, in <session_layout> prose, in step 0.1 step 4, and again in the <config_contract> / template prose. The full per-phase state checklist is also reproduced in the State-file initial stub even though <workflow_context> and step 0.1.3 explicitly say the full schema is owned by qa-flow.md <state_file>.Reason: Every duplicated invariant is resent each agent turn at full token cost; the checklist duplication also risks the two copies drifting out of sync. Solution: State the project-wide-not-per-IDENTIFIER rule once in <session_layout> and reference it; trim the State-file stub to the minimal seed header plus an IDENTIFIER line rather than reproducing the full 8-row checklist that qa-flow.md owns. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/qa-flow-gap-and-requirements-clarification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/qa-flow-test-case-specification.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Rosetta | Problem: The <present_for_approval> step 4.4 defines a closed approval-token list (approved/approve/yes), loose-phrasing rejection, and max-retry escalation entirely inline, without binding to the hitl skill. pa-hardening requires user involvement / HITL to live in the hitl skill so full automation can govern it centrally; the sibling Phase 7 file (qa-flow-test-correction.md) DOES anchor its identical gate with "(Approval vocabulary is governed by hitl; this gate's closed token list is the phase-specific specialization)". This phase omits that anchor, so the two HITL gates diverge in their stated authority.Reason: Without the hitl anchor the gate can conflict with the session-wide hitl protocol and is inconsistent with the parent qa-flow.md carve-out that ties Phase 3-7 gates to the hitl skill.Solution: Add a one-line cite mirroring Phase 7 — note that approval vocabulary is governed by the hitl skill and this closed token list is the phase-specific specialization — rather than presenting the token gate as standalone authority. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/qa-flow-test-implementation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Rosetta | Problem: The NEW file ends with a stray, unbalanced closing tag </output> at EOF (after the root element close), with no matching <output> opener anywhere in the file (verified: count 0, count 1). PR-introduced.Reason: pa-hardening/pa-schemas require schema-pure, well-formed phase artifacts; an unbalanced trailing tag breaks XML-tag integrity and is leaked generator/harness scaffolding. Solution: Delete the trailing </output> line so the document closes cleanly on its root element. |
| ⚪ Low | Rosetta | Problem: The <stop_for_execution> step 5.3 defines a hard HITL gate with bypass-refusal logic ("User instruction to bypass this gate must be refused with citation of this rule ... the gate is mechanical and cannot be overridden by instruction alone") entirely inline, with no reference to the hitl skill. Unlike the sibling Phase 7 file which anchors its gate to hitl, this phase presents itself as the standalone authority for a stop-and-wait gate.Reason: pa-hardening requires HITL/user-involvement authority to derive from the hitl skill for central full-automation governance; an unanchored mechanical-override-refusal can conflict with the session-wide hitl policy.Solution: Anchor the stop-for-execution gate to the hitl skill (e.g. note it is a phase-specific specialization of the hitl stop-and-wait protocol), consistent with qa-flow-test-correction.md and the parent qa-flow.md carve-out. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/qa-flow-execution-and-report-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/qa-flow-test-correction.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Rosetta | Problem: The NEW file ends with a stray, unbalanced closing tag </output> at EOF (after the root element close), with no matching <output> opener anywhere in the file (verified: count 0, count 1). PR-introduced.Reason: pa-hardening/pa-schemas require schema-pure, well-formed phase artifacts; an unbalanced trailing tag breaks XML-tag integrity and is leaked generator/harness scaffolding. Solution: Delete the trailing </output> line so the document closes cleanly on its root element. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/testgen-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Reference Integrity | Problem: The rewrite renamed the state-schema section to <state_and_outputs> and moved schema ownership to Phase 0's <state_file_template>. But the sibling phase file testgen-flow-data-collection.md still points to testgen-flow.md <state_file> (step 1.4), a section name that no longer exists in this file. The pointer target was renamed here without the consumer being updated, leaving a dangling cross-phase reference.Reason: An agent following the data-collection pointer cannot resolve testgen-flow.md <state_file> and may improvise a state schema, breaking the cross-phase contract.Solution: Keep a stable anchor: either re-add a <state_file> tag name (or alias) in this workflow that owns/forwards the state schema, or coordinate so data-collection points to Phase 0 <state_file_template> (the true SSoT named in <state_and_outputs>). Make the consumer and the SSoT name agree. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ✅ Much better |
| Decision Branching | 4 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ✅ Much better |
| Rosetta | 4 | ✅ Much better |
📄 instructions/r2/core/workflows/testgen-flow-project-config-loading.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/testgen-flow-data-collection.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Reference Integrity | Problem: Step 1.4 ( <update_state>) instructs to update state 'per the parent flow's canonical state-file schema (declared once in testgen-flow.md <state_file>)'. The rewritten parent testgen-flow.md has no <state_file> section — its state section is <state_and_outputs>, which delegates ownership to Phase 0's <state_file_template> in testgen-flow-project-config-loading.md. The named anchor does not resolve.Reason: An agent grepping for testgen-flow.md <state_file> finds nothing and may invent a state schema, diverging from the Phase 0 template every other phase relies on.Solution: Point step 1.4 at the real SSoT: Phase 0 <state_file_template> in testgen-flow-project-config-loading.md (or the parent's <state_and_outputs> section name), matching what the parent actually declares. |
| ⚪ Low | Example Grounding | Problem: Frontmatter description was shortened to 'Phase 1 of Test Generation - Data collection ' (trailing space, and dropped the 'from Jira and Confluence' specificity present in BASE). The body still relies on config-resolved vendors, so the description no longer signals the concrete sources the phase handles. Reason: Description is the call-to-action surface; losing the source hint slightly weakens phase identification, though body content remains complete. Solution: Restore a concise source hint in the description (e.g. 'Data collection from issue tracker + documentation sources') and trim the trailing space. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 2 | ⬇️ Slightly worse |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/testgen-flow-gap-and-contradiction-analysis.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Rosetta | Problem: Cross-family deep link into skill internals at line 41 ( requirements-use/references/gap-analysis-catalogs.md) violates the pa-rosetta/pa-hardening rule 'no cross-skill deep linking to private content' and 'references must be wrapped in commands or ACQUIRE'd', not named as bare filesystem paths.Reason: Naming a skill's private reference path from a consuming phase is the boundary violation the Rosetta isolation rules exist to prevent. Solution: Invoke the skill by logical name and remove the bare internal path; if the catalog must be cited, wrap it as an ACQUIRE owned by the skill, not the phase. |
| 🔵 Medium | Reference Integrity | Problem: Step 3 of <run_analysis> (line 41) names another skill's private file path requirements-use/references/gap-analysis-catalogs.md directly. The phase reaches into the skill's internal implementation instead of just invoking USE SKILL requirements-use and letting the skill own which catalog it loads.Reason: Coupling a phase to a skill's private reference filename breaks skill folder isolation and will silently break if the skill reorganizes its references. Solution: Drop the explicit references/gap-analysis-catalogs.md path; reference the gap_analysis mode by name only and let the skill resolve its own internal references. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/testgen-flow-question-generation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Rosetta | Problem: The primary HITL gate (step 3.2 line 108 PAUSE — WAIT FOR USER INPUT, plus <workflow_context> line 20) is expressed as inline gate prose and never routes through USE SKILL hitl. The sibling Phase 0 (project-config-loading step 0.6) routes its gate through USE SKILL hitl. pa-rosetta/pa-hardening require HITL/approval to live in the canonical hitl home, so the most important HITL gate of the whole flow is inconsistent with its own sibling.Reason: An inline-only approval gate diverges from the canonical HITL skill and from the sibling phase, so approval-handling behavior is inconsistent at the single most safety-relevant gate in the flow. Solution: Gate the Phase 3 answer-wait/approval via USE SKILL hitl the same way Phase 0 step 0.6 does, keeping the inline prose only as the on-load-failure fallback. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/testgen-flow-requirements-document-generation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Bloat Control | Problem: The <failure_handling> block adds a large non-operational meta-justification 'Conscious tradeoff — why no inline per-entry fallback (declared once, not re-derived per turn)' (lines 143-149) explaining WHY there is no fallback, plus rationale about the sibling test-case-generation phase. This is provenance/rationale prose, not an instruction the agent executes — exactly the non-operational meta-note pa-hardening says to remove. Reason: Non-operational rationale re-sent every turn inflates cost and cognitive load without changing agent behavior; pa-hardening flags exactly this class of meta-note. Solution: Keep only the operational rule ('skill is a hard dependency; on failure re-invoke once then block — no inline fallback'); delete the multi-bullet rationale and the sibling-comparison paragraph. |
| 🔵 Medium | Rosetta | Problem: Cross-family deep linking into skill internals at lines 47 and 146 ( requirements-authoring/references/authoring-catalogs.md, skill SKILL.md deploy path), plus the long non-operational tradeoff note (lines 143-149), both violate pa-rosetta/pa-hardening (no cross-skill deep linking; remove non-operational meta-notes / change-rationale).Reason: These are the two specific Rosetta authoring violations (skill-internal deep link + non-operational provenance) that the hardening reference calls out by name. Solution: Reference the skill by logical name only and strip the rationale paragraph, leaving the one-line operational block-on-failure rule. |
| 🔵 Medium | Cognitive Budget | Problem: Step 4.3 restates the full phase-owned section contract, testgen-specific Executive Summary block, Traceability block, SMART exemplar, and coverage prompt (lines 45-118) while also delegating to the synthesis mode that owns the same schemas. The duplicated contract+rationale enlarges the per-turn surface area for a single document-build step. Reason: Carrying both the full contract and the rationale for the contract in one step pushes the phase toward the upper size band and competes for the agent's attention budget. Solution: Compress step 4.3 to the section table plus the two testgen-only deltas (Executive Summary, Traceability column); move worked SMART/coverage examples to a single short pointer to the skill catalog rather than inlining them. |
| 🔵 Medium | Reference Integrity | Problem: Step 4.3 (line 47) and the failure_handling tradeoff (line 146) name another skill's private internals ( requirements-authoring/references/authoring-catalogs.md and the skill's deploy path instructions/<release>/core/skills/requirements-authoring/SKILL.md) as bare paths. The phase reaches into skill-private files instead of invoking the skill by name and letting it own its references.Reason: Hard-coding a skill's internal reference filename and deploy path into a phase breaks skill isolation and will drift if the skill is reorganized. Solution: Reference requirements-authoring synthesis mode by logical name only; remove the references/authoring-catalogs.md and explicit deploy-path citations. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 3 | ⬇️ Slightly worse |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/testgen-flow-test-case-generation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Precision & Explicitness | Problem: The PR's stated direction is de-hardcoding the vendor to a config-resolved TMS binding (step 5.3 line 84), yet residual hardcoded 'TestRail' remains in operative text: phase_steps line 25 'Generate test cases in TestRail format' and the user-facing message line 281 'Ready to proceed to Phase 6 (TestRail Export)?'. The same concept (the resolved TMS vendor) is named two ways, violating one-term-per-concept. Reason: A step that says 'TestRail format' and a user message that says 'TestRail Export' contradict the config-resolved-vendor model the same file establishes, and can mislead the agent on non-TestRail projects. Solution: Change line 25 to 'Generate test cases in the resolved TMS FORMAT' and line 281 to 'Phase 6 (Test Case Export)'; keep testrail only where it is an explicit example of a resolvable binding. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/testgen-flow-test-case-export.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟠 Very High | Safety Boundaries | Problem: This is the destructive phase (writes to an external TMS). Line 36 ASSERTS the phase owns 'idempotency (the destructive-write confirmation gate + dedup pre-scan)', but no such gate or pre-scan exists anywhere in steps 6.1-6.6. Step 6.3 only asks for a target location; step 6.5 exports every case directly with no pre-write confirmation and no duplicate check. The duplicate risk is only passively acknowledged as a pitfall (line 143 'Re-running export may create duplicates in TMS — document this behavior'). On a rerun the phase will silently re-create every case. Reason: The phase claims a destructive-write safety control it never implements; an unguarded rerun duplicates the entire suite in the external system with no confirmation. Solution: Add an explicit step before step 6.5: dedup pre-scan of the target location for already-exported TC IDs/titles, then a destructive-write confirmation gate (route via USE SKILL hitl) that shows the user the create/skip plan and requires explicit confirmation before any TMS write. Make the line-36 claim point at that real step. |
| 🟡 High | Precision & Explicitness | Problem: Line 36 uses the precise terms 'destructive-write confirmation gate' and 'dedup pre-scan' as if they are defined controls, but neither term is defined or operationalized later in the file, so the modal claim is non-actionable. The agent is told the phase OWNS these controls but is never told how to perform them. Reason: Naming a control the agent cannot locate or execute is an explicitness gap that makes the safety claim unenforceable. Solution: Either add the concrete steps these terms name, or remove the terms; do not assert ownership of a control that has no procedure. |
| 🟡 High | Workflow Completeness | Problem: The parent workflow marks Phase 6 type="HITL" requiring the user to 'confirm export', and <workflow_context> line 19 lists HITL as 'user must provide target location' — but the step sequence has no operational confirm-export gate. The 'confirm export' obligation from the parent and the destructive-write gate named at line 36 are both missing from the numbered steps 6.1-6.9.Reason: A multi-step destructive workflow that omits the parent-mandated confirmation step has an implicit (missing) step at exactly the irreversible action. Solution: Insert a numbered confirm-export step between get_target_location (6.3) and export (6.5) that pauses for explicit user approval of the export scope and target, mirroring the parent's HITL contract. |
| 🟡 High | Rosetta | Problem: The parent workflow designates Phase 6 a HITL gate, yet this phase routes user interaction (target location ask, partial-export decision) through plain inline Ask user prompts and never USE SKILL hitl. pa-rosetta/pa-hardening require HITL approval to live in the hitl skill; sibling Phase 0 already does this, so the family is inconsistent for its two declared HITL phases.Reason: HITL handled outside the hitl skill bypasses the session-wide approval protocol and is inconsistent across the phase family, weakening the guarantee that the user gates the destructive export.Solution: Route the export confirmation and the partial-export user decision through USE SKILL hitl (canonical approval/escalation home), matching Phase 0 step 0.6; keep the inline prompts only as the skill-load-failure fallback. |
| 🔵 Medium | Bloat Control | Problem: Line 36 is a non-operational ownership/meta declaration ('This phase OWNS the export contract … idempotency …; the skill EMITS … it never decides the contract') that describes responsibility boundaries rather than instructing an action, while the action it claims (gate + dedup) is absent. It is meta-prose, not a step. Reason: An ownership claim that substitutes for the missing procedure adds words and a false sense of coverage without changing behavior. Solution: Replace the ownership paragraph with the actual operational steps (dedup scan + confirm gate); keep at most a one-line note of which artifact is the export source. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 2 | ⬇️ Slightly worse |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Single Responsibility | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/coding/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/debugging/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Reference Integrity | Problem: The added <when_to_use_skill> line points the reader to the triage mode 'in <process>', but the new section is named <test_execution_triage> and the skill has no <process> block. The mental hook lands on a section that does not exist.Reason: A pointer to a non-existent tag makes the agent search for content it cannot find, weakening reliable routing into the new mode. Solution: Change 'use the test-execution triage mode in <process>' to point to <test_execution_triage> (the actual section tag). |
| 🔵 Medium | Single Responsibility | Problem: The added <test_execution_triage> block introduces a second, distinct responsibility — read-only triage of an automated-test execution report with its own taxonomy, capture analysis, and cross-failure pattern detection — onto a skill whose core job is root-cause debugging of a single issue. The <when_to_use_skill> and frontmatter now advertise two jobs.Reason: Two responsibilities in one skill raises the chance an agent applies the wrong mode or loads triage machinery when only simple debugging is needed. Solution: Acceptable as a bounded mode if kept thin; if the triage mode grows further it should move to its own skill. For now ensure the triage mode stays a pointer-style specialization (it correctly references <core_concepts> for evidence labels and redaction) and does not accumulate independent process depth. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/orchestrator-contract/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Input Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/reverse-engineering/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Single Responsibility | Problem: The added <analysis_modes> block introduces two concrete, domain-specific modes (test-automation architecture analysis, API-contract extraction) into a skill whose general purpose is code→spec reverse engineering. These modes carry their own GATEs, source-priority lists, and per-endpoint templates, broadening the skill beyond its single distillation responsibility.Reason: Each added concrete mode widens the skill's job count and the surface an agent must scan to apply the right one. Solution: Acceptable while the modes stay thin specializations that EMIT into the phase-owned artifact (they currently do, and defer artifact shape/path to the phase). Watch for further mode accretion; if more modes are added, extract them to a dedicated test/API-analysis skill. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/operation-manager/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Precision & Explicitness | Problem: core_concepts states the CLI as npx rosettify@latest <command> <subcommand> <plan_file> with the <command> slot left as a placeholder, but every concrete invocation in <process> and <validation_checklist> uses the literal command plan (e.g. plan next, plan update_status, plan query). The generic <command> placeholder is never bound to plan, so the one term for the command concept is presented two ways.Reason: A placeholder command that is never bound forces the agent to infer the command name, risking malformed invocations. Solution: State the command literally once ( npx rosettify@latest plan <subcommand> <plan_file>) as plan-manager does, or explicitly define that <command> is always plan for this skill. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Output Contract | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/discovery/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Rosetta | Problem: The frontmatter description (lines 3) is two full sentences (~55 tokens): 'Rosetta skill to gather source artifacts ... Use to collect issues/tickets, test cases, and documentation pages for downstream requirements, test design, or debugging phases.' pa-hardening requires frontmatter description be a call-to-action and extremely dense (<30 tokens).Reason: Frontmatter is loaded into every agent's context for skill selection; an over-long description wastes the always-resident budget and violates the Rosetta frontmatter density rule. Solution: Compress the description to a single dense call-to-action under 30 tokens, e.g. 'Collect + normalize + redact source-of-record artifacts (Jira/Confluence/TestRail via MCP) into the phase-defined raw-context artifact.' |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/skills/discovery/references/confluence-binding.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/discovery/references/jira-binding.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/discovery/references/testrail-binding.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/requirements-use/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Goal Specification | Problem: The PR adds a whole second responsibility, the <gap_analysis> analysis-only mode (multi-source contradiction/gap/ambiguity classification over Jira/Confluence/TestRail/API-spec/test-plan data), but neither the frontmatter description ("Consume approved requirements to drive planning, implementation, and validation...") nor <when_to_use_skill> ("implementing from approved requirements, planning work from requirement IDs, or auditing requirement-to-delivery traceability") mentions gap analysis. Skill routing is driven by description/when-to-use, so the new mode is undiscoverable by the dispatching agent.Reason: An entry mode that the trigger metadata never names will not be loaded for the cases it exists to serve. Solution: Add the gap_analysis mode to <when_to_use_skill> and to the frontmatter description (e.g. add a clause about analyzing collected multi-source data for gaps/contradictions/ambiguities) so the mode is selectable. |
| 🟡 High | Single Responsibility | Problem: The PR bolts a whole new <gap_analysis> analysis-only mode (lines 93-105) onto a skill whose stated job (frontmatter + <role> line 24: 'using requirements as execution contract') is consuming approved requirements to drive planning/implementation/validation. Multi-source contradiction/gap/ambiguity detection across Jira/Confluence/TestRail/API-spec/test-plans is a distinct responsibility from requirement-to-delivery traceability, pushing the skill from 1-2 jobs toward 3.Reason: A skill description that advertises only requirement usage but silently contains a second analysis mode harms skill selection and violates the SRP/single-responsibility expectation; agents may load it for the wrong job or miss the mode entirely. Solution: Either keep the two responsibilities deliberately fused and state the dual-mode scope explicitly in the frontmatter/ <role>/<when_to_use_skill> so callers understand both modes, or extract <gap_analysis> into its own analysis skill that the phase invokes separately. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 3 | ⬇️ Slightly worse |
| Single Responsibility | 3 | ⬇️ Slightly worse |
| Output Contract | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/requirements-use/references/gap-analysis-catalogs.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/skills/requirements-authoring/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Single Responsibility | Problem: The PR adds a <synthesis> mode (synthesize multi-source Jira/Confluence/TestRail/answers/gap-analysis data into one structured requirements document) on top of the existing authoring/updating/reviewing responsibility. The frontmatter description was NOT updated to name synthesis (it still reads only "Author, update, and validate functional and non-functional requirements..."), though <when_to_use_skill> was updated to add "synthesizing". This is a milder version of the requirements-use split, and synthesis shares the authoring rules, so it is more cohesive — but description/when-to-use are now inconsistent about the mode set.Reason: Frontmatter drives skill selection; if it omits a mode that when-to-use and the body define, the mode may not be routed. Solution: Add synthesis to the frontmatter description so trigger metadata matches the <when_to_use_skill> and the <synthesis> body. |
| 🔵 Medium | Workflow Completeness | Problem: Compression collapsed the explicit base <authoring_flow> (15 ordered bullets including 'Check against current best practices' and 'Once drafting is done proactively seek user approval') into a 3-step flow (lines 85-91). The proactive 'check against current best practices' step is no longer stated in SKILL.md or the catalogs.Reason: Dropping an explicit ordered step from a multi-step authoring flow risks the agent skipping the best-practices validation that the base flow enforced. Solution: Confirm the best-practices-check step is intentionally dropped or fold it back into step 2 of <authoring_flow> (e.g. 'run quality-gate + best-practices checks'). |
| 🔵 Medium | Precision & Explicitness | Problem: The base NFR rule 'Update existing requirements with new schema' (base <nonfunctional_requirements>) was dropped in the compression and is not recoverable in NEW SKILL.md (line 78 NFR clause) nor in references/authoring-catalogs.md (NFR schema section, lines 114-125). The instruction to migrate already-authored requirements onto the current schema is lost.Reason: Without the migrate-to-current-schema directive, the agent may leave legacy units on an outdated schema during updates, producing an inconsistent requirements set. Solution: Re-add a short clause (in <requirement_statements> NFR bullet or the catalogs schema-fields section) stating that existing requirement units must be updated to the current schema when re-authored. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/skills/requirements-authoring/references/authoring-catalogs.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Rosetta | Problem: Lines 51 and 49-55 restate the requirement schema fields and ID conventions, and the unit template (lines 9-28) duplicates the verbatim <req> template that also lives in the asset ra-requirement-unit.xml. The brief says SKILL.md owns rules/methods and this file holds reference catalogs, but the <req> template is now duplicated across this reference AND the asset, violating DRY/SSoT within the family.Reason: Duplicated canonical templates drift apart (already visible vs the XML asset), the exact failure mode the single-source convention prevents. Solution: Keep the verbatim <req> template in exactly one location (the asset) and have this catalog reference it, rather than copying the full template block. |
| 🔵 Medium | Precision & Explicitness | Problem: The <req> unit template here (lines 9-28) uses the OLD two-field shape: `NotStarted |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/scenarios-generation/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Single Responsibility | Problem: The new skill carries three distinct authoring modes — gwt_spec (Given-When-Then API specs), generation (TMS-format cases), and a vendor_binding resolver — plus a shared validation checklist. That is broad for one skill; the gwt_spec mode and the TMS generation mode are quite different artifact shapes. Reason: Two related but materially different output artifacts (ATC GWT specs vs TMS Steps/Expected cases) increase the cognitive search space within a single resident prompt. Solution: Acceptable as one skill since all three are 'design test scenarios from requirements'; keep but ensure the mode boundary in <gwt_spec> vs stays crisp so an agent never blends an ATC spec with a TMS case template. No split required. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/scenarios-generation/references/gwt-spec.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/scenarios-generation/references/testrail-export.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Rosetta | Problem: Rosetta prompts are coding-agent-agnostic and avoid hardcoded tool names; this binding embeds concrete mcp_testrail_* tool signatures (steps 1, 7, 8) as the operational contract. While a vendor binding legitimately names the vendor, the per-call MCP symbol form is a specific tool-name assumption rather than a 'tell how to think' resolution from project config.Reason: Hardcoded tool symbols reduce agent-agnostic portability, which the Rosetta gate guards against. Solution: Frame the MCP signatures as the shape to invoke against the project-resolved TestRail MCP tool names, consistent with the SKILL's stance that the skill never reads config and the phase resolves the binding. |
| 🔵 Medium | Dependency Management | Problem: MCP tool names are hardcoded throughout the process steps ( mcp_testrail_get_project, mcp_testrail_get_cases, mcp_testrail_add_case). This is a vendor-specific export binding, so TestRail names are expected here, but the tool-call invocations assume one MCP server naming scheme; a TestRail MCP exposed under a different tool prefix in the target project would not match. The file does parameterize the vendor concept abstractly (the swap table at the end), but the live process steps bake the exact mcp_testrail_* symbols.Reason: Hardcoded MCP symbol names can silently mismatch a differently-named TestRail MCP, breaking the export at call time. Solution: State once near the top that the TestRail MCP tool symbols are placeholders for whatever the project's TestRail MCP actually exposes (resolved from config), so an agent maps them rather than expecting those literal names. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/scenarios-generation/references/testrail-format.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/testing/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Single Responsibility | Problem: The added <implementation_modes> block (lines 61-84) bolts three substantial new responsibilities onto the testing skill: UI impl mode, API impl mode, and a two-part Selector mode (identify + page-object authoring). The BASE skill was a focused 'write thorough isolated tests' skill; the PR now also makes it the owner of page-object/selector identification and API-spec-to-test implementation. That pushes the skill from 1-2 responsibilities toward four distinct workflows.Reason: A skill carrying four loosely-related modes is harder to load lazily and dilutes the single-responsibility contract the schema favors. Solution: Confirm these impl modes belong in testing rather than a dedicated implementation/selector skill; if kept, scope the frontmatter/role so the added responsibilities are declared, or split selector identification into its own skill. |
| 🔵 Medium | Conflict Resolution | Problem: The new modes assert the PHASE is SSoT for paths/taxonomy/contract/read-write boundary/iteration cap, while the resident Quality bar still hardcodes absolutes (>=80% coverage, 1s timeout, mock-external-only). When a phase supplies a different coverage/assertion taxonomy, it is not stated which wins — the canonical quality bar or the phase binding. Reason: Two SSoT claims (phase bindings vs canonical quality bar) touch overlapping territory (coverage, assertion taxonomy) without an explicit tiebreak, risking inconsistent agent behavior. Solution: Add one line stating precedence: phase bindings govern paths/taxonomy/output contract; the canonical Quality bar and Mocking policy remain non-negotiable unless the phase explicitly overrides a named item. Resolve the implicit overlap between 'PHASE is SSoT' and 'rules below are canonical'. |
| 🔵 Medium | Rosetta | Problem: The added General method line embeds an imperative skill-to-skill call inside the skill body: match the repository's existing patterns (USE SKILL coding standards-first mode ...). Per pa-hardening the skills-can't-call-skills boundary discourages a skill imperatively invoking a sibling skill from within its procedure; references to sibling skills are normally surfaced as recommendations, not inline USE SKILL directives in the method steps.Reason: An imperative USE SKILL inside the skill body couples this skill to a sibling skill's internal mode name ('standards-first mode'), which is sibling-internals awareness the boundary rule warns against. Solution: Demote the inline USE SKILL coding to a reference/recommendation (the section already lists skill coding — standards-first mode), or phrase as 'apply repo conventions per the coding standards-first guidance' without an imperative invoke inside the mode procedure, preserving the no-sibling-call boundary. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Single Responsibility | 3 | ⬇️ Slightly worse |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/testing/references/implementation-examples.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Dependency Management | Problem: The new file bakes in specific framework/tool names (pytest, Jest, JUnit 5 + RestAssured, MUI class names) as full code blocks rather than parameterized shapes. Reason: pa-rosetta requires coding-agent-agnostic prompts, but a skill asset of worked examples legitimately shows concrete language samples since the calling phase owns the real binding and the file labels them non-authoritative. Solution: This is acceptable as-is because line 3 and line 113 explicitly frame them as 'shape references only' that the agent must 'adapt to the project's existing patterns'. No change required for behavior; if tightening, keep the per-language examples but reinforce the agnostic disclaimer once near the top. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/aqa-flow-code-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/aqa-flow-data-collection.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Rosetta | Problem: The frontmatter description (line 3) still reads 'Data Collection from TestRail and Confluence' — naming the two vendors as fixed — while the rewritten body (lines 19-22) makes vendors config-resolved and explicitly NOT hardcoded. Reason: pa-rosetta requires coding-agent/vendor-agnostic prompts and frontmatter as a dense call-to-action; the description contradicts the body's own non-hardcoding rule, a minor consistency defect. Solution: Align the description with the new config-resolved model, e.g. 'Phase 1 of AQA workflow - collect test-case + feature context via configured TMS/documentation vendors', to remove the hardcoded-vendor mismatch. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-requirements-clarification.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Rosetta | Problem: The new frontmatter description is oversized and packs implementation detail: 'Phase 2 of AQA workflow - Requirements Clarification (gap-filling questioning) and Assertion Transcription (derives typed assertions via the requirements-use gap_analysis mode and writes them to the test plan as a mandatory list) - USER INTERACTION REQUIRED'. This is ~55-60 tokens, well over the pa-hardening <30-token call-to-action target, and leaks internal mechanics (skill name, mode name, step references) into the discovery surface. Reason: pa-hardening mandates frontmatter description be a small dense call-to-action (<30 tokens); the discovery shell should not carry per-step implementation internals. Solution: Shorten the description to a dense call-to-action under 30 tokens, e.g. 'Phase 2 of AQA - clarify requirements and define explicit typed assertions (USER INTERACTION REQUIRED)'. Move the gap_analysis/requirements-use mechanics into the body (already present in <workflow_context>). |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-selector-identification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-selector-implementation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Rosetta | Problem: The NEW file ends with a stray, unbalanced closing tag </output> at EOF (after </aqa_flow_selector_implementation>), with no matching <output> opener anywhere in the file. This is a schema-impurity / well-formedness defect introduced by the PR (verified: count 0, count 1).Reason: pa-hardening/pa-schemas require schema-pure, well-formed phase artifacts; an unbalanced tag breaks XML-tag integrity and can confuse downstream parsing of the phase body. Solution: Delete the trailing </output> line at the end of the file so the phase document closes cleanly on </aqa_flow_selector_implementation>. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-test-correction.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Rosetta | Problem: The PR adds a stray literal </output> tag as the final line of the file (diff line +</output> at the end), outside the root <aqa_flow_test_correction> element. This is leaked tool/harness scaffolding, not prompt content. Per pa-schemas/pa-hardening the artifact must be schema-pure and source-agnostic; a dangling unmatched closing tag pollutes the phase body and is the kind of AI-slop artifact the authoring skill explicitly forbids.Reason: An unbalanced XML-like tag in the published phase confuses agents parsing the prompt block and signals tooling contamination. Solution: Delete the trailing </output> line so the file ends cleanly with </aqa_flow_test_correction>. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-test-implementation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Rosetta | Problem: The PR appends a stray literal </output> line at EOF (diff line +</output> after </aqa_flow_test_implementation>), outside the root element. This is harness/tool scaffolding leaked into a published phase, violating schema-purity and the source-agnostic, state-only requirement in pa-hardening/pa-schemas.Reason: An unbalanced closing tag is tooling contamination and can mislead agents parsing the phase body. Solution: Delete the trailing </output> so the file ends with </aqa_flow_test_implementation>. |
| ⚪ Low | Rosetta | Problem: The rewritten phase bakes the HITL refusal gate inline ( <stop_for_execution> step 6.3: 'User instruction to bypass this gate must be refused with citation of this rule...'). pa-hardening states user involvement and HITL should be governed by the hitl skill, not restated as bespoke refusal logic inside each phase body.Reason: Duplicated, per-phase HITL refusal logic drifts from the single canonical gate authority and is the boundary pa-hardening warns against. Solution: Route the stop/refuse behavior through the hitl skill (as the parent aqa-flow.md already does via type="HITL"); keep only the phase-specific binding (what to wait for) and reference the hitl gate authority instead of re-implementing refusal wording per phase. |
| ⚪ Low | Output Contract | Problem:<workflow_context> calls the appended record ## Test Implementation, while <implementation_handoff_contract> lists it as the 'Test Implementation record' with five ### subsections; the exact top-level heading string an agent must write is implied rather than stated once.Reason: Cosmetic; the record content and subsections are fully specified, so the artifact is still produceable. Solution: State the canonical record heading once (e.g. ## Test Implementation) and reference it from the contract and checklist by that exact string. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-test-report-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Reference Integrity | Problem: Phase 4 skill map lists scenarios-generation (- Phase 4: scenarios-generation, coding.) as a skill tag. This logical name is not an established skill in the AQA/QA family references and is not defined in-prompt; per pa-rosetta, Rosetta skills must come from the canonical docs/definitions/skills.md list.Reason: A non-canonical skill tag will fail to resolve via ACQUIRE/USE SKILL at Phase 4, breaking the test-specification step. Solution: Verify scenarios-generation against docs/definitions/skills.md; if it is not a canonical skill name, replace it with the correct existing skill (e.g. testing/tech-specs) or add it to the canonical list before referencing. |
| 🔵 Medium | Rosetta | Problem: Frontmatter description is a multi-sentence paragraph (~80 tokens: full source-system enumeration, framework list, and end-to-end pipeline recap). pa-hardening requires the frontmatter description be a call-to-action that is extremely small and dense (<30 tokens).Reason: The description field is the routing/selection signal loaded into every agent context; an oversized paragraph wastes the cached-token budget and dilutes the matching cue. Solution: Compress description to a single dense call-to-action trigger (e.g. 'MUST apply for backend API test automation: spec analysis → implementation → execution → corrections.'); move the tool/source enumeration into the body where it is already restated. |
| 🔵 Medium | Rosetta | Problem:<references> lists a Phase 4 skill scenarios-generation and the parent maps phases to files like qa-flow-test-case-specification.md, qa-flow-gap-and-requirements-clarification.md, and qa-flow-execution-and-report-analysis.md. Per pa-rosetta, Rosetta prompts must reference only canonical names from docs/definitions/*.md; whether scenarios-generation and these phase files are canonical cannot be confirmed from the workflow alone.Reason: Non-canonical logical names cause zero-document ACQUIREs at runtime; this is a name-hygiene risk, not a structural break, so medium severity. Solution: Verify each referenced skill and phase-file name against docs/definitions/skills.md and docs/definitions/workflows.md; align names or add the missing canonical entries. |
| 🔵 Medium | Bloat Control | Problem: The frontmatter description is a long multi-sentence enumeration (TestRail/Jira, pytest/Jest/JUnit/RestAssured/SuperTest, plus a full second paragraph restating sources/tools). pa-hardening requires the frontmatter description to be a dense call-to-action under ~30 tokens; this one is roughly 90+ tokens and duplicates the <description_and_purpose> body (which itself defers to the frontmatter).Reason: Over-long frontmatter inflates every routing decision's token cost and violates the <30-token description rule; behavior is unaffected so this is medium severity. Solution: Trim the description to a short call-to-action (e.g. 'MUST apply for backend API test-automation tasks: write/extend/debug API tests from test cases and specs.'); keep the tool/source enumeration in the body, not the frontmatter. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow-api-spec-analysis.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Cognitive Budget | Problem: At ~14.4K chars this single phase file carries a full endpoint-contract template, a complete worked example, a redaction catalog with a grep re-scan list, an Analysis Summary block, and a validation checklist all inline. It is the largest of the seven files and exceeds the pa-hardening size guidance (300-500 lines ideal; split when larger), concentrating multiple heavy sub-contracts into one phase load. Reason: The whole template + example + redaction catalog is resent in context every turn the phase runs; progressive disclosure of the large static catalog reduces the per-turn cognitive/token load without losing the contract. Solution: Move the verbatim <endpoint_contract_template> worked example and/or the <redaction_contract> catalog to a referenced asset the skill ACQUIREs on demand, keeping the phase file to the section list + binding + checklist (progressive disclosure). |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-data-collection.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Rosetta | Problem: This phase ( baseSchema: docs/schemas/phase.md) ACQUIREs and executes a sibling phase: step 1.2b.2 ACQUIRE qa-flow-documentation-mcp-subflow.md FROM KB and 1.2b.4 'execute all numbered steps inside <execute_documentation_mcp>'. The target file is also a phase (baseSchema: docs/schemas/phase.md), and the parent qa-flow.md does not reference it. pa-hardening boundary: phases cannot call phases; only the parent workflow composes phases.Reason: Phase-calls-phase violates the Rosetta composition boundary; a sibling phase invoking another phase breaks progressive-disclosure ownership and creates a hidden control-flow dependency the orchestrator/workflow does not see. Solution: Either (a) promote the documentation-MCP collection into a step owned by qa-flow.md (the workflow), or (b) make the MCP-collection content a skill asset/reference ACQUIRE'd by the discovery skill rather than a sibling phase file, so data-collection no longer invokes another phase. |
| 🔵 Medium | Reference Integrity | Problem:<raw_data_contract> Backend Source Code Analysis references RefSrc/ docs, but the canonical Rosetta term is refsrc/ (lowercase, per pa-rosetta target-folder list). The same file uses refsrc/{project-name}/docs/ correctly elsewhere (step 2.1 in the sibling api-spec file), so the capitalization is inconsistent within the family.Reason: Inconsistent casing of a path/term reference can cause an agent to look up a non-existent directory; one term per concept is required. Solution: Change RefSrc/ to the canonical refsrc term to match the pa-rosetta folder reference and the rest of the qa-flow family. |
| ⚪ Low | Reference Integrity | Problem: The phase references vendor binding files loaded inside discovery (references/<vendor>-binding.md, references/confluence-binding.md) and the subflow tag qa-flow-documentation-mcp-subflow. These resolve only if those binding files and the subflow file exist in the KB; they are valid in-family references but unverifiable from this file alone.Reason: In-family references are valid by design; this is a low-severity reminder to confirm the targets ship together. Solution: Ensure references/testrail-binding.md, references/jira-binding.md, references/confluence-binding.md exist under the discovery skill and the subflow file is published. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow-documentation-mcp-subflow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Rosetta | Problem: This file is authored as a phase ( baseSchema: docs/schemas/phase.md) but functions as a sub-phase that the sibling phase qa-flow-data-collection.md ACQUIREs and runs (its <execute_documentation_mcp> steps are executed by the parent phase). It carries step="1.2b", i.e. it is a numbered step of another phase, not a standalone phase, and the workflow qa-flow.md does not list it. This is the phase-cannot-call-phase boundary from the other side.Reason: Being a phase-schema file invoked by a sibling phase violates the Rosetta composition boundary and gives it sibling/reverse awareness (it names its parent phase qa-flow-data-collection), which phases must not have.Solution: Re-home this fragment as a discovery skill asset/reference (so the collecting phase calls the skill, not a sibling phase), or fold its steps directly into qa-flow.md/the data-collection step under the parent workflow. If kept separate, it must not use the phase schema while being invoked by another phase. |
| ⚪ Low | Reference Integrity | Problem: Relies on discovery loading references/confluence-binding.md and on qa-project-config.md config keys; these resolve only when those files ship in the KB / target structure.Reason: In-family/target-structure references are valid by design; low-severity confirmation only. Solution: Confirm references/confluence-binding.md exists under the discovery skill and that the documented config keys match the qa-project-config template. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow-execution-and-report-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-gap-and-requirements-clarification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-project-config-loading.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: The phase file is large and heavily redundant about the project-config-is-project-wide-not-per-IDENTIFIER point. It is restated in <description_and_purpose>, twice in <workflow_context> Output, in <session_layout> prose, in step 0.1 step 4, and again in the <config_contract> / template prose. The full per-phase state checklist is also reproduced in the State-file initial stub even though <workflow_context> and step 0.1.3 explicitly say the full schema is owned by qa-flow.md <state_file>.Reason: Every duplicated invariant is resent each agent turn at full token cost; the checklist duplication also risks the two copies drifting out of sync. Solution: State the project-wide-not-per-IDENTIFIER rule once in <session_layout> and reference it; trim the State-file stub to the minimal seed header plus an IDENTIFIER line rather than reproducing the full 8-row checklist that qa-flow.md owns. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-test-case-specification.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Rosetta | Problem: The <present_for_approval> step 4.4 defines a closed approval-token list (approved/approve/yes), loose-phrasing rejection, and max-retry escalation entirely inline, without binding to the hitl skill. pa-hardening requires user involvement / HITL to live in the hitl skill so full automation can govern it centrally; the sibling Phase 7 file (qa-flow-test-correction.md) DOES anchor its identical gate with "(Approval vocabulary is governed by hitl; this gate's closed token list is the phase-specific specialization)". This phase omits that anchor, so the two HITL gates diverge in their stated authority.Reason: Without the hitl anchor the gate can conflict with the session-wide hitl protocol and is inconsistent with the parent qa-flow.md carve-out that ties Phase 3-7 gates to the hitl skill.Solution: Add a one-line cite mirroring Phase 7 — note that approval vocabulary is governed by the hitl skill and this closed token list is the phase-specific specialization — rather than presenting the token gate as standalone authority. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-test-correction.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Rosetta | Problem: The NEW file ends with a stray, unbalanced closing tag </output> at EOF (after the root element close), with no matching <output> opener anywhere in the file (verified: count 0, count 1). PR-introduced.Reason: pa-hardening/pa-schemas require schema-pure, well-formed phase artifacts; an unbalanced trailing tag breaks XML-tag integrity and is leaked generator/harness scaffolding. Solution: Delete the trailing </output> line so the document closes cleanly on its root element. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-test-implementation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Rosetta | Problem: The NEW file ends with a stray, unbalanced closing tag </output> at EOF (after the root element close), with no matching <output> opener anywhere in the file (verified: count 0, count 1). PR-introduced.Reason: pa-hardening/pa-schemas require schema-pure, well-formed phase artifacts; an unbalanced trailing tag breaks XML-tag integrity and is leaked generator/harness scaffolding. Solution: Delete the trailing </output> line so the document closes cleanly on its root element. |
| ⚪ Low | Rosetta | Problem: The <stop_for_execution> step 5.3 defines a hard HITL gate with bypass-refusal logic ("User instruction to bypass this gate must be refused with citation of this rule ... the gate is mechanical and cannot be overridden by instruction alone") entirely inline, with no reference to the hitl skill. Unlike the sibling Phase 7 file which anchors its gate to hitl, this phase presents itself as the standalone authority for a stop-and-wait gate.Reason: pa-hardening requires HITL/user-involvement authority to derive from the hitl skill for central full-automation governance; an unanchored mechanical-override-refusal can conflict with the session-wide hitl policy.Solution: Anchor the stop-for-execution gate to the hitl skill (e.g. note it is a phase-specific specialization of the hitl stop-and-wait protocol), consistent with qa-flow-test-correction.md and the parent qa-flow.md carve-out. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/testgen-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Reference Integrity | Problem: The rewrite renamed the state-schema section to <state_and_outputs> and moved schema ownership to Phase 0's <state_file_template>. But the sibling phase file testgen-flow-data-collection.md still points to testgen-flow.md <state_file> (step 1.4), a section name that no longer exists in this file. The pointer target was renamed here without the consumer being updated, leaving a dangling cross-phase reference.Reason: An agent following the data-collection pointer cannot resolve testgen-flow.md <state_file> and may improvise a state schema, breaking the cross-phase contract.Solution: Keep a stable anchor: either re-add a <state_file> tag name (or alias) in this workflow that owns/forwards the state schema, or coordinate so data-collection points to Phase 0 <state_file_template> (the true SSoT named in <state_and_outputs>). Make the consumer and the SSoT name agree. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ✅ Much better |
| Decision Branching | 4 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ✅ Much better |
| Rosetta | 4 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-data-collection.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Reference Integrity | Problem: Step 1.4 ( <update_state>) instructs to update state 'per the parent flow's canonical state-file schema (declared once in testgen-flow.md <state_file>)'. The rewritten parent testgen-flow.md has no <state_file> section — its state section is <state_and_outputs>, which delegates ownership to Phase 0's <state_file_template> in testgen-flow-project-config-loading.md. The named anchor does not resolve.Reason: An agent grepping for testgen-flow.md <state_file> finds nothing and may invent a state schema, diverging from the Phase 0 template every other phase relies on.Solution: Point step 1.4 at the real SSoT: Phase 0 <state_file_template> in testgen-flow-project-config-loading.md (or the parent's <state_and_outputs> section name), matching what the parent actually declares. |
| ⚪ Low | Example Grounding | Problem: Frontmatter description was shortened to 'Phase 1 of Test Generation - Data collection ' (trailing space, and dropped the 'from Jira and Confluence' specificity present in BASE). The body still relies on config-resolved vendors, so the description no longer signals the concrete sources the phase handles. Reason: Description is the call-to-action surface; losing the source hint slightly weakens phase identification, though body content remains complete. Solution: Restore a concise source hint in the description (e.g. 'Data collection from issue tracker + documentation sources') and trim the trailing space. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 2 | ⬇️ Slightly worse |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/testgen-flow-gap-and-contradiction-analysis.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Rosetta | Problem: Cross-family deep link into skill internals at line 41 ( requirements-use/references/gap-analysis-catalogs.md) violates the pa-rosetta/pa-hardening rule 'no cross-skill deep linking to private content' and 'references must be wrapped in commands or ACQUIRE'd', not named as bare filesystem paths.Reason: Naming a skill's private reference path from a consuming phase is the boundary violation the Rosetta isolation rules exist to prevent. Solution: Invoke the skill by logical name and remove the bare internal path; if the catalog must be cited, wrap it as an ACQUIRE owned by the skill, not the phase. |
| 🔵 Medium | Reference Integrity | Problem: Step 3 of <run_analysis> (line 41) names another skill's private file path requirements-use/references/gap-analysis-catalogs.md directly. The phase reaches into the skill's internal implementation instead of just invoking USE SKILL requirements-use and letting the skill own which catalog it loads.Reason: Coupling a phase to a skill's private reference filename breaks skill folder isolation and will silently break if the skill reorganizes its references. Solution: Drop the explicit references/gap-analysis-catalogs.md path; reference the gap_analysis mode by name only and let the skill resolve its own internal references. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/testgen-flow-project-config-loading.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/testgen-flow-question-generation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Rosetta | Problem: The primary HITL gate (step 3.2 line 108 PAUSE — WAIT FOR USER INPUT, plus <workflow_context> line 20) is expressed as inline gate prose and never routes through USE SKILL hitl. The sibling Phase 0 (project-config-loading step 0.6) routes its gate through USE SKILL hitl. pa-rosetta/pa-hardening require HITL/approval to live in the canonical hitl home, so the most important HITL gate of the whole flow is inconsistent with its own sibling.Reason: An inline-only approval gate diverges from the canonical HITL skill and from the sibling phase, so approval-handling behavior is inconsistent at the single most safety-relevant gate in the flow. Solution: Gate the Phase 3 answer-wait/approval via USE SKILL hitl the same way Phase 0 step 0.6 does, keeping the inline prose only as the on-load-failure fallback. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/testgen-flow-requirements-document-generation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Bloat Control | Problem: The <failure_handling> block adds a large non-operational meta-justification 'Conscious tradeoff — why no inline per-entry fallback (declared once, not re-derived per turn)' (lines 143-149) explaining WHY there is no fallback, plus rationale about the sibling test-case-generation phase. This is provenance/rationale prose, not an instruction the agent executes — exactly the non-operational meta-note pa-hardening says to remove. Reason: Non-operational rationale re-sent every turn inflates cost and cognitive load without changing agent behavior; pa-hardening flags exactly this class of meta-note. Solution: Keep only the operational rule ('skill is a hard dependency; on failure re-invoke once then block — no inline fallback'); delete the multi-bullet rationale and the sibling-comparison paragraph. |
| 🔵 Medium | Rosetta | Problem: Cross-family deep linking into skill internals at lines 47 and 146 ( requirements-authoring/references/authoring-catalogs.md, skill SKILL.md deploy path), plus the long non-operational tradeoff note (lines 143-149), both violate pa-rosetta/pa-hardening (no cross-skill deep linking; remove non-operational meta-notes / change-rationale).Reason: These are the two specific Rosetta authoring violations (skill-internal deep link + non-operational provenance) that the hardening reference calls out by name. Solution: Reference the skill by logical name only and strip the rationale paragraph, leaving the one-line operational block-on-failure rule. |
| 🔵 Medium | Cognitive Budget | Problem: Step 4.3 restates the full phase-owned section contract, testgen-specific Executive Summary block, Traceability block, SMART exemplar, and coverage prompt (lines 45-118) while also delegating to the synthesis mode that owns the same schemas. The duplicated contract+rationale enlarges the per-turn surface area for a single document-build step. Reason: Carrying both the full contract and the rationale for the contract in one step pushes the phase toward the upper size band and competes for the agent's attention budget. Solution: Compress step 4.3 to the section table plus the two testgen-only deltas (Executive Summary, Traceability column); move worked SMART/coverage examples to a single short pointer to the skill catalog rather than inlining them. |
| 🔵 Medium | Reference Integrity | Problem: Step 4.3 (line 47) and the failure_handling tradeoff (line 146) name another skill's private internals ( requirements-authoring/references/authoring-catalogs.md and the skill's deploy path instructions/<release>/core/skills/requirements-authoring/SKILL.md) as bare paths. The phase reaches into skill-private files instead of invoking the skill by name and letting it own its references.Reason: Hard-coding a skill's internal reference filename and deploy path into a phase breaks skill isolation and will drift if the skill is reorganized. Solution: Reference requirements-authoring synthesis mode by logical name only; remove the references/authoring-catalogs.md and explicit deploy-path citations. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 3 | ⬇️ Slightly worse |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/testgen-flow-test-case-export.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟠 Very High | Safety Boundaries | Problem: This is the destructive phase (writes to an external TMS). Line 36 ASSERTS the phase owns 'idempotency (the destructive-write confirmation gate + dedup pre-scan)', but no such gate or pre-scan exists anywhere in steps 6.1-6.6. Step 6.3 only asks for a target location; step 6.5 exports every case directly with no pre-write confirmation and no duplicate check. The duplicate risk is only passively acknowledged as a pitfall (line 143 'Re-running export may create duplicates in TMS — document this behavior'). On a rerun the phase will silently re-create every case. Reason: The phase claims a destructive-write safety control it never implements; an unguarded rerun duplicates the entire suite in the external system with no confirmation. Solution: Add an explicit step before step 6.5: dedup pre-scan of the target location for already-exported TC IDs/titles, then a destructive-write confirmation gate (route via USE SKILL hitl) that shows the user the create/skip plan and requires explicit confirmation before any TMS write. Make the line-36 claim point at that real step. |
| 🟡 High | Precision & Explicitness | Problem: Line 36 uses the precise terms 'destructive-write confirmation gate' and 'dedup pre-scan' as if they are defined controls, but neither term is defined or operationalized later in the file, so the modal claim is non-actionable. The agent is told the phase OWNS these controls but is never told how to perform them. Reason: Naming a control the agent cannot locate or execute is an explicitness gap that makes the safety claim unenforceable. Solution: Either add the concrete steps these terms name, or remove the terms; do not assert ownership of a control that has no procedure. |
| 🟡 High | Workflow Completeness | Problem: The parent workflow marks Phase 6 type="HITL" requiring the user to 'confirm export', and <workflow_context> line 19 lists HITL as 'user must provide target location' — but the step sequence has no operational confirm-export gate. The 'confirm export' obligation from the parent and the destructive-write gate named at line 36 are both missing from the numbered steps 6.1-6.9.Reason: A multi-step destructive workflow that omits the parent-mandated confirmation step has an implicit (missing) step at exactly the irreversible action. Solution: Insert a numbered confirm-export step between get_target_location (6.3) and export (6.5) that pauses for explicit user approval of the export scope and target, mirroring the parent's HITL contract. |
| 🟡 High | Rosetta | Problem: The parent workflow designates Phase 6 a HITL gate, yet this phase routes user interaction (target location ask, partial-export decision) through plain inline Ask user prompts and never USE SKILL hitl. pa-rosetta/pa-hardening require HITL approval to live in the hitl skill; sibling Phase 0 already does this, so the family is inconsistent for its two declared HITL phases.Reason: HITL handled outside the hitl skill bypasses the session-wide approval protocol and is inconsistent across the phase family, weakening the guarantee that the user gates the destructive export.Solution: Route the export confirmation and the partial-export user decision through USE SKILL hitl (canonical approval/escalation home), matching Phase 0 step 0.6; keep the inline prompts only as the skill-load-failure fallback. |
| 🔵 Medium | Bloat Control | Problem: Line 36 is a non-operational ownership/meta declaration ('This phase OWNS the export contract … idempotency …; the skill EMITS … it never decides the contract') that describes responsibility boundaries rather than instructing an action, while the action it claims (gate + dedup) is absent. It is meta-prose, not a step. Reason: An ownership claim that substitutes for the missing procedure adds words and a false sense of coverage without changing behavior. Solution: Replace the ownership paragraph with the actual operational steps (dedup scan + confirm gate); keep at most a one-line note of which artifact is the export source. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 2 | ⬇️ Slightly worse |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-test-case-generation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Precision & Explicitness | Problem: The PR's stated direction is de-hardcoding the vendor to a config-resolved TMS binding (step 5.3 line 84), yet residual hardcoded 'TestRail' remains in operative text: phase_steps line 25 'Generate test cases in TestRail format' and the user-facing message line 281 'Ready to proceed to Phase 6 (TestRail Export)?'. The same concept (the resolved TMS vendor) is named two ways, violating one-term-per-concept. Reason: A step that says 'TestRail format' and a user message that says 'TestRail Export' contradict the config-resolved-vendor model the same file establishes, and can mislead the agent on non-TestRail projects. Solution: Change line 25 to 'Generate test cases in the resolved TMS FORMAT' and line 281 to 'Phase 6 (Test Case Export)'; keep testrail only where it is an explicit example of a resolvable binding. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📋 Prompt Quality Validation Report❌ Validation FailedSummary by File
📋 Full per-file findings (Problem / Reason / Solution + Gates Comparison) → Workflow run Summary (PR comments are capped at 65,536 chars; details live on the Actions run). |
📋 Prompt Quality Validation Report❌ Validation FailedSummary by File
📄
|
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/coding/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Single Responsibility | Problem: The added <implementation_modes> block layers two distinct sub-behaviors (standards-first reading discipline and an approved-apply HITL fix-application loop) onto the general coding skill. approved-apply is described as 'a domain-specific specialization of hitl' and embeds an approval-gate state machine (steps 1-6 with GATEs) that is closer to a workflow-phase responsibility than to the coding skill's single implementation responsibility.Reason: Adding a HITL fix-application loop widens the skill from 'implement code' toward 'coordinate an approval workflow', diluting single responsibility and increasing cognitive surface. Solution: Keep standards-first as a coding concern but consider relocating the approved-apply approval/gate orchestration to the owning workflow phase, leaving the skill to EMIT the proposed-change content only. |
| 🔵 Medium | Rosetta | Problem: New <implementation_modes> approved-apply step says USE SKILL debugging; debugging's new triage block reciprocally says `USE SKILL `coding. This is a peer-domain skill pointer (mild reciprocal coupling), not a cross-cutting MUST-skill like sensitive-data/hitl. It does not break execution (any skill is loadable) but slightly bends the skills-avoid-peer-skill convention.Reason: Peer-skill pointers add mild coupling but do not stop the agent; low impact. Solution: Keep as a one-way pointer or move the cross-reference to frontmatter/keywords; avoid the reciprocal coding<->debugging pairing so neither skill body steers into the other. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Single Responsibility | 3 | ⬇️ Slightly worse |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/debugging/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Single Responsibility | Problem: The added read-only <test_execution_triage> mode (parse report → categorize → page-source/HTTP analysis → cross-failure patterns → emit artifact) is a UI/API automated-test report-triage responsibility distinct from the skill's core 'find root cause before fixing' job, broadening the skill into AQA report analysis.Reason: Layering a report-triage mode onto debugging adds a second responsibility and audience (AQA execution reports), raising cognitive surface beyond the single debugging responsibility. Solution: Acceptable if intentional, but consider whether triage belongs in a dedicated AQA analysis skill/phase; at minimum keep the mode strictly scoped so the core debugging method stays primary. |
| 🔵 Medium | Rosetta | Problem: New <test_execution_triage> block ends with 'GATE: read-only. Proposing or applying fixes is a separate correction phase — USE SKILL coding.' This makes the debugging skill actively invoke the coding skill. Combined with coding/SKILL.md's new USE SKILL debugging, the two peer domain skills now call each other — a forbidden skill-to-skill (and circular) dependency. Cross-cutting USE SKILL sensitive-data (line 26) is the accepted convention and is fine; the coding call is not.Reason: Peer-skill pointer adds mild coupling; does not stop execution (any skill is loadable). Low impact. Solution: Change the GATE to a non-imperative boundary statement (e.g. 'fixes are a separate correction phase owned by the calling workflow') rather than USE SKILL coding, leaving skill chaining to the workflow. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Single Responsibility | 3 | ⬇️ Slightly worse |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/reverse-engineering/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Single Responsibility | Problem: New <analysis_modes> adds two concrete modes (test-automation architecture analysis and API-contract extraction) onto the general code→spec reverse-engineering skill. The API-contract-extraction mode (locate Swagger/OpenAPI/route defs, emit per-endpoint parameters/schemas/auth/citations) is a fairly distinct AQA/API-discovery responsibility layered onto a skill whose core is 'recover intent / WHAT and WHY from code'.Reason: Two added concrete modes widen the skill's responsibility and audience beyond spec recovery, increasing cognitive surface even though each mode references the general method. Solution: Acceptable if these modes are intended specializations, but keep them clearly subordinate to the general method; if they grow, factor API-contract extraction into its own analysis skill/phase. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 3 | ⬇️ Slightly worse |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/requirements-use/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/requirements-use/references/gap-analysis-catalogs.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/testgen-flow-requirements-document-generation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/testgen-flow-test-case-export.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Reference Integrity | Problem: Step 6.1 resolves the vendor binding from agents/testgen/{TICKET-KEY}/testgen-project-config.md (per-ticket path), but Phase 0 (testgen-flow-project-config-loading.md step 0.3) saves the config to agents/testgen/testgen-project-config.md (project-wide, explicitly 'not per-ticket'). The path the export phase reads from will not exist.Reason: Per-run mismatch: the phase reads the vendor config from a per-ticket path while Phase 0 saves it project-wide, so the MCP export/format path is silently abandoned and the agent degrades to manual/inline every run. Solution: Change the config path in step 6.1 to the project-wide agents/testgen/testgen-project-config.md to match the Phase 0 canonical location. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 3 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ✅ Much better |
| Cognitive Budget | 4 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/testgen-flow-test-case-generation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Reference Integrity | Problem: Step 5.3 resolves the FORMAT vendor binding from agents/testgen/{TICKET-KEY}/testgen-project-config.md (per-ticket), but Phase 0 writes the config to the project-wide agents/testgen/testgen-project-config.md. Same path mismatch as the export phase.Reason: Per-run mismatch: the phase reads the vendor config from a per-ticket path while Phase 0 saves it project-wide, so the MCP export/format path is silently abandoned and the agent degrades to manual/inline every run. Solution: Point step 5.3's config read at agents/testgen/testgen-project-config.md to align with the Phase 0 canonical path. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 3 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ✅ Much better |
| Cognitive Budget | 4 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/testgen-flow.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/skills/coding/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Rosetta | Problem:<implementation_modes> approved-apply step uses USE SKILL debugging while debugging reciprocally points back with `USE SKILL `coding, forming a peer-domain skill pairing (not a cross-cutting MUST-skill).Reason: Mild coupling between peer skills; does not break execution but bends the convention. Solution: Make the reference one-way or relocate it to frontmatter/keywords; avoid the reciprocal coding<->debugging coupling. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/operation-manager/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Dependency Management | Problem: The expanded frontmatter line model: claude-sonnet-4-6, gpt-5.5, gemini-3.1-pro bakes specific vendor model identifiers directly into the skill. Rosetta is coding-agent-agnostic; hardcoding model names per-vendor is the kind of literal that config-key precedence is meant to avoid.Reason: Hardcoded vendor model names age quickly and conflict with the agent-agnostic principle, though impact is low since it is only an advisory frontmatter hint. Solution: Keep the model hint minimal or move model selection to config-driven guidance rather than enumerating three literal vendor model ids in frontmatter. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/orchestrator-contract/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/skills/orchestrator-contract/references/dispatch-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/discovery/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/skills/discovery/references/confluence-binding.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/skills/discovery/references/jira-binding.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Output Contract | Problem: Unlike the confluence-binding, this binding has no explicit Output sections section enumerating the ordered blocks (per-field entries, Gaps, Redaction) the binding emits into the phase artifact. The field map plus per-field branch imply the shape, but the deterministic ordered output contract is left to the base SKILL.Reason: A point-of-use binding that omits its own ordered output contract forces the agent to infer block order, slightly reducing determinism across vendors. Solution: Add a short Output sections block (matching the confluence-binding's pattern) listing the ordered emitted blocks and that every section is present with None. for empties. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/skills/discovery/references/testrail-binding.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Output Contract | Problem: Like the jira-binding, this file has no explicit Output sections block enumerating the ordered emitted blocks; the confluence-binding has one but jira and testrail do not, so the three sibling bindings are inconsistent in declaring their output ordering.Reason: Inconsistent output-contract declaration across the three bindings makes the per-vendor emitted shape rely on inference for two of three vendors. Solution: Add a short Output sections block listing the ordered blocks (case entry fields, Gaps, Redaction) and the present-with-None. rule, matching the confluence-binding. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/skills/operation-manager/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟠 Very High | Reference Integrity | Problem: Both <core_concepts> and <resources> instruct ACQUIRE todo-tasks-fallback.md FROM KB as the universal baseline fallback, but the file todo-tasks-fallback.md exists only under instructions/r3/core/rules/, not anywhere in instructions/r2/. In an r2-scoped agent the universal fallback path cannot be resolved.Reason: The fallback is presented as the agent-agnostic universal baseline; if the ACQUIRE cannot resolve in r2, agents without rosettify/MCP/Node have no working mechanism, breaking the skill's primary promise. Solution: Add todo-tasks-fallback.md under r2 rules (or point the ACQUIRE at the actual r2 rule path that provides the fallback). Do not reference an r3-only file from an r2 skill. |
| 🟡 High | Rosetta | Problem: The skill name operation-manager is not in the canonical docs/definitions/skills.md, which lists plan-manager instead. pa-rosetta.md definitions policy requires using names from docs/definitions/*.md and not auto-adding out-of-list items. Additionally r2 now ships both operation-manager (new) and the legacy plan-manager; the new dispatch-template binds operation-manager while docs/definitions/skills.md still lists only plan-manager, leaving a duplicate-skill/definitions mismatch.Reason: An out-of-canon skill name means workflows/agents referencing the definition list by logical name cannot reliably resolve this skill, violating the Rosetta definitions policy. Solution: Either add operation-manager to docs/definitions/skills.md (and remove/retire plan-manager if this replaces it), or rename the skill to the canonical plan-manager. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 2 | ⬇️ Slightly worse |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/operation-manager/assets/om-schema.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/skills/orchestrator-contract/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 4 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/skills/orchestrator-contract/references/dispatch-template.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Success Criteria | Problem: The new template is a fill-in-the-blanks form with no 'done-when' check telling the orchestrator the template is correctly filled (e.g., no rule that every placeholder must be resolved before dispatch). Reason: Without a completeness check an orchestrator can dispatch a half-filled template, defeating the quality-gate the parent SKILL relies on. Solution: Add one line at the top stating the dispatch is valid only when no bracketed placeholder remains and Tasks/Scope/Output are non-empty. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/skills/requirements-authoring/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Reference Integrity | Problem: The section (line 138) still lists asset ra-requirement-unit.md, but the <req> unit template now lives in assets/ra-requirement-unit.xml (the file modified in this same PR) and in references/authoring-catalogs.md. The .md asset is the wrong extension and points at a non-existent file.Reason: An ACQUIRE on a wrong-extension asset path returns nothing, so the agent cannot load the canonical unit template when drafting. Solution: Change the asset reference in from ra-requirement-unit.md to ra-requirement-unit.xml (and likewise verify ra-intent-capture / ra-validation-rubric / ra-change-log extensions). |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/requirements-authoring/assets/ra-requirement-unit.xml
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟠 Very High | Conflict Resolution | Problem: The diff collapsed the two-field implementation schema. BASE had `NotStarted |
| 🟡 High | Rosetta | Problem: The asset's canonical <req> template now diverges from the same <req> template kept in requirements-authoring/SKILL.md and references/authoring-catalogs.md: the asset collapses the implementation status enum + <implementationNotes> into one bracketed field with new vocab (Todo/Modify), while the catalog keeps the five-value enum + notes. Two contradictory canonical templates exist inside one skill family.Reason: DRY/SSoT violation: an agent filling a requirement unit gets conflicting schemas and emits inconsistent units. Solution: Pick one canonical home for the <req> schema (the asset) and have SKILL.md/authoring-catalogs.md point to it, or revert the asset to the enum+notes shape so all three agree. |
| 🟡 High | Precision & Explicitness | Problem: The new collapsed <implementation> value mixes a status enum and free-text notes in one element with inline brackets, and uses different status words (Todo, Modify) than the rest of the skill (Planned, ToBeModified, ToBeRemoved). One concept now has two term sets.Reason: Mixed vocabulary and combined fields make machine parsing and human authoring ambiguous, lowering requirement-unit reliability. Solution: Use one status vocabulary across the skill and keep status separate from notes (do not pack enum + prose into one element). |
| 🔵 Medium | Output Contract | Problem: By dropping <implementationNotes> the asset loses the explicit per-status guidance (Implemented: files affected; ToBeModified: what was dropped) that BASE carried, replacing it with a terser inline hint that no longer enumerates the per-status expectation.Reason: Authors using the asset alone now get weaker guidance on what to record per implementation status. Solution: Reinstate the per-status notes guidance (kept verbatim in the catalog) so the asset and catalog convey the same field contract. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Output Contract | 3 | ⬇️ Slightly worse |
| Conflict Resolution | 2 | ⬇️ Slightly worse |
| Precision & Explicitness | 3 | ⬇️ Slightly worse |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/requirements-authoring/references/authoring-catalogs.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Conflict Resolution | Problem: The catalog's <req> template (lines 25-26) keeps the five-value <implementation> enum + <implementationNotes>, while the asset ra-requirement-unit.xml modified in this same PR collapsed those into a single `[Implemented |
| 🔵 Medium | Reference Integrity | Problem: Line 3 asserts 'SMART / MUST-SHOULD-MAY / priority conventions are owned by SKILL.md — not restated here', but SKILL.md never mentions SMART (grep returns nothing). The pointer to an owner section that does not exist is a dangling cross-reference introduced by this new file. Reason: A reader who follows the pointer to find SMART guidance in SKILL.md finds nothing, undermining trust in the 'owned by SKILL.md' deferral pattern. Solution: Either drop the SMART claim from line 3 or add the SMART convention to SKILL.md so the ownership pointer resolves. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/scenarios-generation/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| ⚪ Low | Single Responsibility | Problem:<when_to_use_skill> (line 17) names the sibling skill: 'Use to DESIGN scenarios/specs; testing IMPLEMENTS them.' This is lateral sibling awareness of another skill by name in the body.Reason: pa-hardening forbids cross-skill awareness except in frontmatter/keywords; naming a sibling couples the two skills. Solution: Drop the explicit testing name from the body; the design-vs-implement boundary is already conveyed by 'runnable test code is testing, not this skill' could be softened to 'implementation is a separate concern'. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/scenarios-generation/references/gwt-spec.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/skills/scenarios-generation/references/testrail-export.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/skills/scenarios-generation/references/testrail-format.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/skills/testing/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: The modified SKILL more than doubled (3709 → 7972 chars). <implementation_modes> restates phase-SSoT framing ('The calling workflow PHASE is the SSoT ...') that is then repeated in implementation-examples.md ('The calling workflow PHASE owns the artifact paths ...'), partial duplication across the resident skill and its reference.Reason: Resident-prompt growth and cross-file restatement add cognitive load against the progressive-disclosure goal the diff itself claims. Solution: Compress the repeated phase-SSoT sentence to a single canonical statement in <implementation_modes> and let the reference cite it rather than restating ownership of paths/taxonomy/contract. |
| 🔵 Medium | Rosetta | Problem: Added <implementation_modes> general-method line 65 directs USE SKILL \\coding\ standards-first mode from inside a SKILL body — a skill invoking another skill, which crosses the 'Skills can't call skills' boundary.Reason: Skill-to-skill invocation is a boundary violation; reliable skill loading is the caller's (phase/subagent) job, not a sibling skill's. Solution: Phrase as a non-invoking reference to repo conventions (the coding skill already appears as a passive <resources> entry) instead of an imperative USE SKILL inside the procedure. |
| 🔵 Medium | Single Responsibility | Problem: The diff adds a large <implementation_modes> block (lines 61-84) with three modes (UI / API / Selector) plus a frontmatter description that still only advertises 'thorough, isolated, idempotent tests with 80% coverage'. The skill now also owns page-object selector identification and TMS-id-bearing API spec implementation, widening it beyond the original unit/scenario testing job.Reason: The added modes expand scope but the call-to-action description was not updated, so selection by description may under-trigger the new capability. Solution: Extend the frontmatter description to signal the impl/selector modes (still <30 tokens), so discovery matches the broadened responsibility added by the diff. |
| ⚪ Low | Instruction Ordering | Problem: The added <implementation_modes> sits between <core_concepts> and <validation_checklist>; its hard GATE/stop rules (API mode step 1, selector read-only) are embedded mid-procedure rather than surfaced as top-level hard constraints, slightly weakening the constraints-first ordering the base file had.Reason: Hard constraints buried inside step lists are more likely to be deprioritized by the agent. Solution: Leave structure but ensure the stop/GATE conditions are visually marked (already partly done with 'GATE:'); optionally hoist a one-line 'hard gates' pointer into <core_concepts>. |
| ⚪ Low | Conflict Resolution | Problem: Priority order appears in two places with the same defaults but different phrasing: implementation-examples API rules say 'A spec's priority field overrides this default', while SKILL <implementation_modes> defers everything to the PHASE as SSoT. No explicit statement of which wins (phase taxonomy vs spec priority field) when they differ.Reason: Two priority sources without a stated tie-breaker can yield inconsistent ordering decisions across runs. Solution: Add one clause stating precedence (phase-supplied taxonomy/cap overrides the reference's default priority order) so the two are not read as competing. |
| ⚪ Low | Cognitive Budget | Problem: API impl mode step 1 (line 74) bundles a 4-part GATE (approved-specs + recorded approval + API-contract artifact + discoverable patterns) plus the stop-rule into one dense line; combined with three modes the resident section pushes the ~5-step working-memory cap. Reason: Dense multi-clause GATE lines are easier for the agent to partially skip; minor reliability risk. Solution: No content change required, but if compressed (see Bloat issue) the GATE conditions read more clearly as an enumerated list. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/testing/references/implementation-examples.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/aqa-flow-code-analysis.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Reference Integrity | Problem: Input contract makes project_description.md (repo root) the primary framework/standards source, but the parent aqa-flow.md Phase 3 row passes only CONTEXT.md+ARCHITECTURE.md+IMPLEMENTATION.md and never mentions project_description.md. The Input GATE accepts either, so it resolves, but the primary input named in the phase is not the input the workflow advertises it will supply.Reason: Slight mismatch between phase input naming and parent dispatch could make a phase-only reader expect a file the orchestrator did not pass. Solution: Add one line noting project_description.md is an AQA-target convention (also used by qa-flow) and that the parent workflow's repo-doc trio satisfies the GATE alternative; align wording so the named primary matches what Phase 3 receives. |
| ⚪ Low | Bloat Control | Problem: Two parenthetical SSoT meta-notes in <workflow_context> (single SSoT — referenced by other sections and single SSoT — referenced by other sections as "the read-only scope") restate the same DRY-anchor idea twice within four lines.Reason: Minor redundancy; does not affect behavior but adds reading cost on a dense context block. Solution: Keep one SSoT annotation and drop the second restatement; the anchor names already make the reference obvious. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 4 | ✅ Much better |
📄 instructions/r2/core/workflows/aqa-flow-data-collection.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Cognitive Budget | Problem:<workflow_context> packs vendor-resolution key-precedence lists for two vendor families, in-scope signal rules, fallback rules, guardrails-rule semantics, zero-doc pointer, and ACQUIRE-success definition into one dense block before the numbered steps. This is a large search space for the first thing the phase agent reads.Reason: Front-loading all vendor-resolution detail raises cognitive load and risks the agent skimming the precedence rules it must apply later. Solution: Move the config-key precedence tables into the per-vendor steps (1.2 / 1.3) where they are used, leaving only the scope summary in <workflow_context>. |
| 🔵 Medium | Rosetta | Problem: Frontmatter description still reads Data Collection from TestRail and Confluence (hardcoded vendors), while the rewritten body deliberately config-resolves the TMS/Documentation vendors and warns vendors are NOT hardcoded. The description contradicts the body's vendor-agnostic design.Reason: A coding agent selecting the phase by description sees hardcoded vendors that the body explicitly forbids, creating a portability/SSoT inconsistency. Solution: Change the description to vendor-neutral wording (e.g. Data Collection from configured TMS and documentation sources), keeping defaults out of the call-to-action. |
| ⚪ Low | Conflict Resolution | Problem:<zero_doc_protocol> is physically nested inside <gather_confluence step="1.3"> but is referenced by <gather_testrail step="1.2">, <workflow_context>, and <confirm_inputs>. Its scope reads as Confluence-local even though it is a phase-wide rule.Reason: A reader scanning step 1.2 may not realize the zero-doc rule it must apply is defined two steps later inside a sibling block. Solution: Hoist <zero_doc_protocol> to phase level (a sibling of the step blocks) so its cross-step authority is structurally clear. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 4 | ✅ Much better |
📄 instructions/r2/core/workflows/aqa-flow-requirements-clarification.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Rosetta | Problem: Frontmatter description is far over the <30-token density target: it runs to a full multi-clause sentence ( ...Assertion Transcription (derives typed assertions via the requirements-use gap_analysis mode and writes them to the test plan as a mandatory list) — USER INTERACTION REQUIRED), embedding mechanism detail that belongs in the body, not the call-to-action.Reason: Frontmatter must be small and dense for selection; the embedded mechanism inflates token cost without aiding phase selection. Solution: Compress the description to a dense call-to-action (e.g. Phase 2 of AQA — clarify gaps with the user and transcribe typed assertions; USER INTERACTION REQUIRED); drop the parenthetical mechanism. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow-selector-identification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/aqa-flow-selector-implementation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/aqa-flow.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/aqa-flow-test-implementation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/aqa-flow-test-report-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/aqa-flow-test-correction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/qa-flow-api-spec-analysis.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: At 14.4K chars the phase carries a full <endpoint_contract_template> block (lines 99-134) AND a complete <redaction_contract> catalog (lines 180-191) AND a full <validation_checklist> (lines 211-222) inline. The redaction catalog (5 numbered redaction-target classes plus a grep list) is point-of-use reference material that pa-hardening <audit_survival_checks> says belongs in references/, not inline in a phase.Reason: Inline catalogs inflate the per-phase cognitive search space and duplicate redaction logic that the sensitive-data skill already owns.Solution: Consider extracting the <redaction_contract> catalog and the worked endpoint example (lines 136-177) to a references/ file ACQUIRE'd at point of use, keeping only the GATE/process lines inline. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/qa-flow-data-collection.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Rosetta | Problem: Step 1.2b.2 instructs ACQUIRE qa-flow-documentation-mcp-subflow.md FROM KB and 1.2b.4 says execute all numbered steps inside <execute_documentation_mcp>. Both this file and the acquired file carry baseSchema: docs/schemas/phase.md. A phase directly acquiring and driving the numbered steps of another phase-schema file is in tension with the boundary rule 'Phases can't call phases' (briefing line 23, pa-hardening line 15). It is framed as a reusable 'subflow' fragment rather than a USE FLOW call, but the phase-schema on the child plus parent-driven step execution makes the boundary ambiguous.Reason: Phase-to-phase step execution risks the lateral-awareness boundary; a clearer schema or routing keeps the contract clean. Solution: Consider giving the subflow file a distinct non-phase schema (e.g. a reference/fragment schema) or routing the branch through the parent workflow so a phase is not executing another phase's numbered steps. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/qa-flow-documentation-mcp-subflow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Input Contract | Problem: There is no explicit prep-completion + load-context dependency bullet in this fragment, unlike the sibling phase files which state prerequisites. pa-hardening <audit_survival_checks> requires the prep/load-context dependency in workflows and in any consumer of prep output; the fragment consumes qa-project-config.md and Phase 0 output (lines 16-17) yet only states the config dependency, not the prep/load-context completion gate.Reason: Without the stated prep dependency a directly-ACQUIRE'd fragment could run against unloaded context. Solution: Add a one-line prerequisite noting Rosetta prep + load-context completion (or an explicit pointer that the parent phase already guarantees it). |
| 🔵 Medium | Rosetta | Problem: The file declares baseSchema: docs/schemas/phase.md (line 6) but is not a standalone phase: it is an ACQUIRE'd fragment driven step-by-step by qa-flow-data-collection step 1.2b, it is not listed as a phase in the parent qa-flow.md <workflow_phases> (which enumerates only phases 0-7), and its <description_and_purpose> says 'Parent phase: qa-flow-data-collection ACQUIREs this fragment'. A phase-schema file that is really a sub-fragment of another phase blurs the phase boundary and the schema-purity expectation.Reason: Tension with the phases-can't-call-phases boundary, but the fragment is ACQUIRE-driven with a full skip-path and deterministic 4-branch output contract, so the agent still reaches a defined terminal state. Lower behavioral impact than a true phase call. Solution: Consider using a fragment/reference schema rather than phase.md, or registering it as a real distinct phase, so its schema matches its actual role. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/qa-flow-execution-and-report-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/qa-flow-gap-and-requirements-clarification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/qa-flow-project-config-loading.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Cognitive Budget | Problem: File is 20,427 chars / ~277 lines, over the ~300-line/20K soft budget and the largest of the 5 phase files. The <config_contract> 12-row key table, the full Step-input user-prompt template, and the Project config template together carry heavy detail that the engineer must hold while also tracking the redaction rules in <safety_boundaries>.Reason: A single phase running near the size ceiling raises compaction risk and the chance the agent drops a config key or a redaction step under load. Solution: Move the verbatim ## Step-input user-prompt template and ## Project config template blocks into a referenced point-of-use file (e.g. a references/ asset ACQUIRE'd at step 0.1) so the phase inline keeps only the contract table and decision lines. |
| 🔵 Medium | Bloat Control | Problem: The required-key information is stated three times: once in the <config_contract> table, once in the <validation_checklist> ('Every required key from <config_contract> is present'), and again field-by-field inside the ## Project config template markdown. The N/A-reason convention is also restated in the table cells, the Empty-field rule, and the template placeholders.Reason: Triplicated key schema is harder to keep in sync; if one copy is edited later the others silently drift. Solution: Keep the <config_contract> table as the single key authority and replace the per-field placeholder repetition in the Project config template with a pointer ('fields + N/A rules per <config_contract>'). |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/qa-flow-test-case-specification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/qa-flow-test-correction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/qa-flow-test-implementation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/qa-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Structural Coherence | Problem:<skip_rules> declares the always-in-force carve-out as 'Per-phase HITL gates (Phases 3-7 marked type="HITL")', but Phase 0 is declared type="HITL-CONDITIONAL" and carries a real HITL gate ('ASK USER FOR PROJECT INFO if config does not already exist'). The carve-out enumeration 3-7 omits the Phase 0 conditional gate.Reason: The verification-failure unilateral-start override lets the agent skip Phases 0-2; the carve-out list that protects HITL gates should unambiguously include Phase 0's conditional ask so config collection is never silently bypassed. Solution: Adjust the carve-out wording to cover the Phase 0 HITL-CONDITIONAL gate (e.g. 'Phases 0,3-7 carrying type=HITL / HITL-CONDITIONAL gates'), or state that the conditional gate is equally non-suppressible. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/testgen-flow-data-collection.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: The new <pitfalls> and <common_issues> blocks overlap heavily: 'Confluence search may miss child pages — always perform child-page traversal' (pitfalls) and 'Confluence search finds parent but misses child pages → Always perform the child-page traversal' (common_issues) restate the same guidance, and both partly duplicate the confluence binding the phase delegates to.Reason: Redundant lines add cognitive load and risk drift between the phase and the binding that now owns the behavior. Solution: Remove the duplicated child-page / truncation / URL-format lines from one of the two blocks since the discovery confluence-binding already owns that discipline. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/testgen-flow-gap-and-contradiction-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/testgen-flow-project-config-loading.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Output Contract | Problem: The new <state_file_template> ## Phase Details example only shows a ### Phase 1 block (with [Add sections for each completed phase]), but the file is created in Phase 0 where Phase 0's own details row would be expected first; the template never shows a ### Phase 0 entry even though step 0.6 marks Phase 0 complete.Reason: A reader following the template may omit the Phase 0 detail block, leaving the state file's first completed phase undocumented. Solution: Add a brief ### Phase 0 example row to the ## Phase Details block in <state_file_template>, or note that Phase 0 details are recorded via the completion-status checkbox only. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/testgen-flow-question-generation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Rosetta | Problem: Step 3.4 <create_answers_document> HITL gate ends at step 3.5's Ask: "Ready to proceed to Phase 4..." but, unlike the sibling Phase 0/Phase 1/Phase 2 files in this same PR, the advance to Phase 4 is not wrapped in an explicit STOP-and-wait / hitl skill marker at step 3.5; the mandatory wait is only stated in <workflow_context> HITL GATE for the answer step, not for the Phase 4 advance ask.Reason: Consistency with the other phase gates in this PR; without it the final ask could be treated as informational and Phase 4 auto-started on silence. Solution: Add an explicit stop/wait clause (or USE SKILL hitl) to step 3.5 step 4 mirroring the Phase 0 step 0.6 and Phase 1 step 1.4 gates, so the proceed-to-Phase-4 ask is mechanically enforced. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/debugging/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Single Responsibility | Problem: The added read-only <test_execution_triage> mode (parse report → categorize → page-source/HTTP analysis → cross-failure patterns → emit artifact) is a UI/API automated-test report-triage responsibility distinct from the skill's core 'find root cause before fixing' job, broadening the skill into AQA report analysis.Reason: Layering a report-triage mode onto debugging adds a second responsibility and audience (AQA execution reports), raising cognitive surface beyond the single debugging responsibility. Solution: Acceptable if intentional, but consider whether triage belongs in a dedicated AQA analysis skill/phase; at minimum keep the mode strictly scoped so the core debugging method stays primary. |
| 🔵 Medium | Rosetta | Problem: New <test_execution_triage> block ends with 'GATE: read-only. Proposing or applying fixes is a separate correction phase — USE SKILL coding.' This makes the debugging skill actively invoke the coding skill. Combined with coding/SKILL.md's new USE SKILL debugging, the two peer domain skills now call each other — a forbidden skill-to-skill (and circular) dependency. Cross-cutting USE SKILL sensitive-data (line 26) is the accepted convention and is fine; the coding call is not.Reason: Peer-skill pointer adds mild coupling; does not stop execution (any skill is loadable). Low impact. Solution: Change the GATE to a non-imperative boundary statement (e.g. 'fixes are a separate correction phase owned by the calling workflow') rather than USE SKILL coding, leaving skill chaining to the workflow. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Single Responsibility | 3 | ⬇️ Slightly worse |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/discovery/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/skills/discovery/references/confluence-binding.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/skills/discovery/references/jira-binding.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Output Contract | Problem: Unlike the confluence-binding, this binding has no explicit Output sections section enumerating the ordered blocks (per-field entries, Gaps, Redaction) the binding emits into the phase artifact. The field map plus per-field branch imply the shape, but the deterministic ordered output contract is left to the base SKILL.Reason: A point-of-use binding that omits its own ordered output contract forces the agent to infer block order, slightly reducing determinism across vendors. Solution: Add a short Output sections block (matching the confluence-binding's pattern) listing the ordered emitted blocks and that every section is present with None. for empties. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/skills/discovery/references/testrail-binding.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Output Contract | Problem: Like the jira-binding, this file has no explicit Output sections block enumerating the ordered emitted blocks; the confluence-binding has one but jira and testrail do not, so the three sibling bindings are inconsistent in declaring their output ordering.Reason: Inconsistent output-contract declaration across the three bindings makes the per-vendor emitted shape rely on inference for two of three vendors. Solution: Add a short Output sections block listing the ordered blocks (case entry fields, Gaps, Redaction) and the present-with-None. rule, matching the confluence-binding. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/skills/requirements-authoring/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Reference Integrity | Problem: The section (line 138) still lists asset ra-requirement-unit.md, but the <req> unit template now lives in assets/ra-requirement-unit.xml (the file modified in this same PR) and in references/authoring-catalogs.md. The .md asset is the wrong extension and points at a non-existent file.Reason: An ACQUIRE on a wrong-extension asset path returns nothing, so the agent cannot load the canonical unit template when drafting. Solution: Change the asset reference in from ra-requirement-unit.md to ra-requirement-unit.xml (and likewise verify ra-intent-capture / ra-validation-rubric / ra-change-log extensions). |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/requirements-authoring/references/authoring-catalogs.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Conflict Resolution | Problem: The catalog's <req> template (lines 25-26) keeps the five-value <implementation> enum + <implementationNotes>, while the asset ra-requirement-unit.xml modified in this same PR collapsed those into a single `[Implemented |
| 🔵 Medium | Reference Integrity | Problem: Line 3 asserts 'SMART / MUST-SHOULD-MAY / priority conventions are owned by SKILL.md — not restated here', but SKILL.md never mentions SMART (grep returns nothing). The pointer to an owner section that does not exist is a dangling cross-reference introduced by this new file. Reason: A reader who follows the pointer to find SMART guidance in SKILL.md finds nothing, undermining trust in the 'owned by SKILL.md' deferral pattern. Solution: Either drop the SMART claim from line 3 or add the SMART convention to SKILL.md so the ownership pointer resolves. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/requirements-use/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/requirements-use/references/gap-analysis-catalogs.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/reverse-engineering/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Single Responsibility | Problem: New <analysis_modes> adds two concrete modes (test-automation architecture analysis and API-contract extraction) onto the general code→spec reverse-engineering skill. The API-contract-extraction mode (locate Swagger/OpenAPI/route defs, emit per-endpoint parameters/schemas/auth/citations) is a fairly distinct AQA/API-discovery responsibility layered onto a skill whose core is 'recover intent / WHAT and WHY from code'.Reason: Two added concrete modes widen the skill's responsibility and audience beyond spec recovery, increasing cognitive surface even though each mode references the general method. Solution: Acceptable if these modes are intended specializations, but keep them clearly subordinate to the general method; if they grow, factor API-contract extraction into its own analysis skill/phase. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 3 | ⬇️ Slightly worse |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/scenarios-generation/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| ⚪ Low | Single Responsibility | Problem:<when_to_use_skill> (line 17) names the sibling skill: 'Use to DESIGN scenarios/specs; testing IMPLEMENTS them.' This is lateral sibling awareness of another skill by name in the body.Reason: pa-hardening forbids cross-skill awareness except in frontmatter/keywords; naming a sibling couples the two skills. Solution: Drop the explicit testing name from the body; the design-vs-implement boundary is already conveyed by 'runnable test code is testing, not this skill' could be softened to 'implementation is a separate concern'. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/scenarios-generation/references/gwt-spec.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/skills/scenarios-generation/references/testrail-export.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/skills/scenarios-generation/references/testrail-format.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/skills/testing/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: The modified SKILL more than doubled (3709 → 7972 chars). <implementation_modes> restates phase-SSoT framing ('The calling workflow PHASE is the SSoT ...') that is then repeated in implementation-examples.md ('The calling workflow PHASE owns the artifact paths ...'), partial duplication across the resident skill and its reference.Reason: Resident-prompt growth and cross-file restatement add cognitive load against the progressive-disclosure goal the diff itself claims. Solution: Compress the repeated phase-SSoT sentence to a single canonical statement in <implementation_modes> and let the reference cite it rather than restating ownership of paths/taxonomy/contract. |
| 🔵 Medium | Rosetta | Problem: Added <implementation_modes> general-method line 65 directs USE SKILL \\coding\ standards-first mode from inside a SKILL body — a skill invoking another skill, which crosses the 'Skills can't call skills' boundary.Reason: Skill-to-skill invocation is a boundary violation; reliable skill loading is the caller's (phase/subagent) job, not a sibling skill's. Solution: Phrase as a non-invoking reference to repo conventions (the coding skill already appears as a passive <resources> entry) instead of an imperative USE SKILL inside the procedure. |
| 🔵 Medium | Single Responsibility | Problem: The diff adds a large <implementation_modes> block (lines 61-84) with three modes (UI / API / Selector) plus a frontmatter description that still only advertises 'thorough, isolated, idempotent tests with 80% coverage'. The skill now also owns page-object selector identification and TMS-id-bearing API spec implementation, widening it beyond the original unit/scenario testing job.Reason: The added modes expand scope but the call-to-action description was not updated, so selection by description may under-trigger the new capability. Solution: Extend the frontmatter description to signal the impl/selector modes (still <30 tokens), so discovery matches the broadened responsibility added by the diff. |
| ⚪ Low | Instruction Ordering | Problem: The added <implementation_modes> sits between <core_concepts> and <validation_checklist>; its hard GATE/stop rules (API mode step 1, selector read-only) are embedded mid-procedure rather than surfaced as top-level hard constraints, slightly weakening the constraints-first ordering the base file had.Reason: Hard constraints buried inside step lists are more likely to be deprioritized by the agent. Solution: Leave structure but ensure the stop/GATE conditions are visually marked (already partly done with 'GATE:'); optionally hoist a one-line 'hard gates' pointer into <core_concepts>. |
| ⚪ Low | Conflict Resolution | Problem: Priority order appears in two places with the same defaults but different phrasing: implementation-examples API rules say 'A spec's priority field overrides this default', while SKILL <implementation_modes> defers everything to the PHASE as SSoT. No explicit statement of which wins (phase taxonomy vs spec priority field) when they differ.Reason: Two priority sources without a stated tie-breaker can yield inconsistent ordering decisions across runs. Solution: Add one clause stating precedence (phase-supplied taxonomy/cap overrides the reference's default priority order) so the two are not read as competing. |
| ⚪ Low | Cognitive Budget | Problem: API impl mode step 1 (line 74) bundles a 4-part GATE (approved-specs + recorded approval + API-contract artifact + discoverable patterns) plus the stop-rule into one dense line; combined with three modes the resident section pushes the ~5-step working-memory cap. Reason: Dense multi-clause GATE lines are easier for the agent to partially skip; minor reliability risk. Solution: No content change required, but if compressed (see Bloat issue) the GATE conditions read more clearly as an enumerated list. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/testing/references/implementation-examples.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/aqa-flow-code-analysis.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Reference Integrity | Problem: Input contract makes project_description.md (repo root) the primary framework/standards source, but the parent aqa-flow.md Phase 3 row passes only CONTEXT.md+ARCHITECTURE.md+IMPLEMENTATION.md and never mentions project_description.md. The Input GATE accepts either, so it resolves, but the primary input named in the phase is not the input the workflow advertises it will supply.Reason: Slight mismatch between phase input naming and parent dispatch could make a phase-only reader expect a file the orchestrator did not pass. Solution: Add one line noting project_description.md is an AQA-target convention (also used by qa-flow) and that the parent workflow's repo-doc trio satisfies the GATE alternative; align wording so the named primary matches what Phase 3 receives. |
| ⚪ Low | Bloat Control | Problem: Two parenthetical SSoT meta-notes in <workflow_context> (single SSoT — referenced by other sections and single SSoT — referenced by other sections as "the read-only scope") restate the same DRY-anchor idea twice within four lines.Reason: Minor redundancy; does not affect behavior but adds reading cost on a dense context block. Solution: Keep one SSoT annotation and drop the second restatement; the anchor names already make the reference obvious. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 4 | ✅ Much better |
📄 instructions/r3/core/workflows/aqa-flow-data-collection.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Cognitive Budget | Problem:<workflow_context> packs vendor-resolution key-precedence lists for two vendor families, in-scope signal rules, fallback rules, guardrails-rule semantics, zero-doc pointer, and ACQUIRE-success definition into one dense block before the numbered steps. This is a large search space for the first thing the phase agent reads.Reason: Front-loading all vendor-resolution detail raises cognitive load and risks the agent skimming the precedence rules it must apply later. Solution: Move the config-key precedence tables into the per-vendor steps (1.2 / 1.3) where they are used, leaving only the scope summary in <workflow_context>. |
| 🔵 Medium | Rosetta | Problem: Frontmatter description still reads Data Collection from TestRail and Confluence (hardcoded vendors), while the rewritten body deliberately config-resolves the TMS/Documentation vendors and warns vendors are NOT hardcoded. The description contradicts the body's vendor-agnostic design.Reason: A coding agent selecting the phase by description sees hardcoded vendors that the body explicitly forbids, creating a portability/SSoT inconsistency. Solution: Change the description to vendor-neutral wording (e.g. Data Collection from configured TMS and documentation sources), keeping defaults out of the call-to-action. |
| ⚪ Low | Conflict Resolution | Problem:<zero_doc_protocol> is physically nested inside <gather_confluence step="1.3"> but is referenced by <gather_testrail step="1.2">, <workflow_context>, and <confirm_inputs>. Its scope reads as Confluence-local even though it is a phase-wide rule.Reason: A reader scanning step 1.2 may not realize the zero-doc rule it must apply is defined two steps later inside a sibling block. Solution: Hoist <zero_doc_protocol> to phase level (a sibling of the step blocks) so its cross-step authority is structurally clear. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 4 | ✅ Much better |
📄 instructions/r3/core/workflows/aqa-flow-requirements-clarification.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Rosetta | Problem: Frontmatter description is far over the <30-token density target: it runs to a full multi-clause sentence ( ...Assertion Transcription (derives typed assertions via the requirements-use gap_analysis mode and writes them to the test plan as a mandatory list) — USER INTERACTION REQUIRED), embedding mechanism detail that belongs in the body, not the call-to-action.Reason: Frontmatter must be small and dense for selection; the embedded mechanism inflates token cost without aiding phase selection. Solution: Compress the description to a dense call-to-action (e.g. Phase 2 of AQA — clarify gaps with the user and transcribe typed assertions; USER INTERACTION REQUIRED); drop the parenthetical mechanism. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-selector-identification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/aqa-flow-selector-implementation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/aqa-flow-test-correction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/aqa-flow-test-implementation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/aqa-flow-test-report-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/aqa-flow.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow-api-spec-analysis.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: At 14.4K chars the phase carries a full <endpoint_contract_template> block (lines 99-134) AND a complete <redaction_contract> catalog (lines 180-191) AND a full <validation_checklist> (lines 211-222) inline. The redaction catalog (5 numbered redaction-target classes plus a grep list) is point-of-use reference material that pa-hardening <audit_survival_checks> says belongs in references/, not inline in a phase.Reason: Inline catalogs inflate the per-phase cognitive search space and duplicate redaction logic that the sensitive-data skill already owns.Solution: Consider extracting the <redaction_contract> catalog and the worked endpoint example (lines 136-177) to a references/ file ACQUIRE'd at point of use, keeping only the GATE/process lines inline. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-data-collection.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Rosetta | Problem: Step 1.2b.2 instructs ACQUIRE qa-flow-documentation-mcp-subflow.md FROM KB and 1.2b.4 says execute all numbered steps inside <execute_documentation_mcp>. Both this file and the acquired file carry baseSchema: docs/schemas/phase.md. A phase directly acquiring and driving the numbered steps of another phase-schema file is in tension with the boundary rule 'Phases can't call phases' (briefing line 23, pa-hardening line 15). It is framed as a reusable 'subflow' fragment rather than a USE FLOW call, but the phase-schema on the child plus parent-driven step execution makes the boundary ambiguous.Reason: Phase-to-phase step execution risks the lateral-awareness boundary; a clearer schema or routing keeps the contract clean. Solution: Consider giving the subflow file a distinct non-phase schema (e.g. a reference/fragment schema) or routing the branch through the parent workflow so a phase is not executing another phase's numbered steps. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow-documentation-mcp-subflow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Input Contract | Problem: There is no explicit prep-completion + load-context dependency bullet in this fragment, unlike the sibling phase files which state prerequisites. pa-hardening <audit_survival_checks> requires the prep/load-context dependency in workflows and in any consumer of prep output; the fragment consumes qa-project-config.md and Phase 0 output (lines 16-17) yet only states the config dependency, not the prep/load-context completion gate.Reason: Without the stated prep dependency a directly-ACQUIRE'd fragment could run against unloaded context. Solution: Add a one-line prerequisite noting Rosetta prep + load-context completion (or an explicit pointer that the parent phase already guarantees it). |
| 🔵 Medium | Rosetta | Problem: The file declares baseSchema: docs/schemas/phase.md (line 6) but is not a standalone phase: it is an ACQUIRE'd fragment driven step-by-step by qa-flow-data-collection step 1.2b, it is not listed as a phase in the parent qa-flow.md <workflow_phases> (which enumerates only phases 0-7), and its <description_and_purpose> says 'Parent phase: qa-flow-data-collection ACQUIREs this fragment'. A phase-schema file that is really a sub-fragment of another phase blurs the phase boundary and the schema-purity expectation.Reason: Tension with the phases-can't-call-phases boundary, but the fragment is ACQUIRE-driven with a full skip-path and deterministic 4-branch output contract, so the agent still reaches a defined terminal state. Lower behavioral impact than a true phase call. Solution: Consider using a fragment/reference schema rather than phase.md, or registering it as a real distinct phase, so its schema matches its actual role. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow-execution-and-report-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-gap-and-requirements-clarification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-project-config-loading.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Cognitive Budget | Problem: File is 20,427 chars / ~277 lines, over the ~300-line/20K soft budget and the largest of the 5 phase files. The <config_contract> 12-row key table, the full Step-input user-prompt template, and the Project config template together carry heavy detail that the engineer must hold while also tracking the redaction rules in <safety_boundaries>.Reason: A single phase running near the size ceiling raises compaction risk and the chance the agent drops a config key or a redaction step under load. Solution: Move the verbatim ## Step-input user-prompt template and ## Project config template blocks into a referenced point-of-use file (e.g. a references/ asset ACQUIRE'd at step 0.1) so the phase inline keeps only the contract table and decision lines. |
| 🔵 Medium | Bloat Control | Problem: The required-key information is stated three times: once in the <config_contract> table, once in the <validation_checklist> ('Every required key from <config_contract> is present'), and again field-by-field inside the ## Project config template markdown. The N/A-reason convention is also restated in the table cells, the Empty-field rule, and the template placeholders.Reason: Triplicated key schema is harder to keep in sync; if one copy is edited later the others silently drift. Solution: Keep the <config_contract> table as the single key authority and replace the per-field placeholder repetition in the Project config template with a pointer ('fields + N/A rules per <config_contract>'). |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-test-case-specification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow-test-correction.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow-test-implementation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/qa-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Structural Coherence | Problem:<skip_rules> declares the always-in-force carve-out as 'Per-phase HITL gates (Phases 3-7 marked type="HITL")', but Phase 0 is declared type="HITL-CONDITIONAL" and carries a real HITL gate ('ASK USER FOR PROJECT INFO if config does not already exist'). The carve-out enumeration 3-7 omits the Phase 0 conditional gate.Reason: The verification-failure unilateral-start override lets the agent skip Phases 0-2; the carve-out list that protects HITL gates should unambiguously include Phase 0's conditional ask so config collection is never silently bypassed. Solution: Adjust the carve-out wording to cover the Phase 0 HITL-CONDITIONAL gate (e.g. 'Phases 0,3-7 carrying type=HITL / HITL-CONDITIONAL gates'), or state that the conditional gate is equally non-suppressible. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/testgen-flow-data-collection.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: The new <pitfalls> and <common_issues> blocks overlap heavily: 'Confluence search may miss child pages — always perform child-page traversal' (pitfalls) and 'Confluence search finds parent but misses child pages → Always perform the child-page traversal' (common_issues) restate the same guidance, and both partly duplicate the confluence binding the phase delegates to.Reason: Redundant lines add cognitive load and risk drift between the phase and the binding that now owns the behavior. Solution: Remove the duplicated child-page / truncation / URL-format lines from one of the two blocks since the discovery confluence-binding already owns that discipline. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-gap-and-contradiction-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-project-config-loading.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Output Contract | Problem: The new <state_file_template> ## Phase Details example only shows a ### Phase 1 block (with [Add sections for each completed phase]), but the file is created in Phase 0 where Phase 0's own details row would be expected first; the template never shows a ### Phase 0 entry even though step 0.6 marks Phase 0 complete.Reason: A reader following the template may omit the Phase 0 detail block, leaving the state file's first completed phase undocumented. Solution: Add a brief ### Phase 0 example row to the ## Phase Details block in <state_file_template>, or note that Phase 0 details are recorded via the completion-status checkbox only. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-question-generation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Rosetta | Problem: Step 3.4 <create_answers_document> HITL gate ends at step 3.5's Ask: "Ready to proceed to Phase 4..." but, unlike the sibling Phase 0/Phase 1/Phase 2 files in this same PR, the advance to Phase 4 is not wrapped in an explicit STOP-and-wait / hitl skill marker at step 3.5; the mandatory wait is only stated in <workflow_context> HITL GATE for the answer step, not for the Phase 4 advance ask.Reason: Consistency with the other phase gates in this PR; without it the final ask could be treated as informational and Phase 4 auto-started on silence. Solution: Add an explicit stop/wait clause (or USE SKILL hitl) to step 3.5 step 4 mirroring the Phase 0 step 0.6 and Phase 1 step 1.4 gates, so the proceed-to-Phase-4 ask is mechanically enforced. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/testgen-flow-requirements-document-generation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-test-case-export.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Reference Integrity | Problem: Step 6.1 resolves the vendor binding from agents/testgen/{TICKET-KEY}/testgen-project-config.md (per-ticket path), but Phase 0 (testgen-flow-project-config-loading.md step 0.3) saves the config to agents/testgen/testgen-project-config.md (project-wide, explicitly 'not per-ticket'). The path the export phase reads from will not exist.Reason: Per-run mismatch: the phase reads the vendor config from a per-ticket path while Phase 0 saves it project-wide, so the MCP export/format path is silently abandoned and the agent degrades to manual/inline every run. Solution: Change the config path in step 6.1 to the project-wide agents/testgen/testgen-project-config.md to match the Phase 0 canonical location. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 3 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ✅ Much better |
| Cognitive Budget | 4 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-test-case-generation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Reference Integrity | Problem: Step 5.3 resolves the FORMAT vendor binding from agents/testgen/{TICKET-KEY}/testgen-project-config.md (per-ticket), but Phase 0 writes the config to the project-wide agents/testgen/testgen-project-config.md. Same path mismatch as the export phase.Reason: Per-run mismatch: the phase reads the vendor config from a per-ticket path while Phase 0 saves it project-wide, so the MCP export/format path is silently abandoned and the agent degrades to manual/inline every run. Solution: Point step 5.3's config read at agents/testgen/testgen-project-config.md to align with the Phase 0 canonical path. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 3 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ✅ Much better |
| Cognitive Budget | 4 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📋 Prompt Quality Validation Report❌ Validation FailedSummary by File
📋 Full per-file findings (Problem / Reason / Solution + Gates Comparison) → Workflow run Summary (PR comments are capped at 65,536 chars; details live on the Actions run). |
📋 Prompt Quality Validation Report❌ Validation FailedSummary by File
📄
|
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Cognitive Budget | Problem: The header block before the first phase now carries skip-gate logic, phase-load-failure handling, transition precedence, self-check criteria, a 7-item per-phase failure-routing list, and model tiers — well over the ~5-directive comfort window an agent processes reliably, all loaded up front in addition to the seven phase blocks. Reason: Front-loading routing detail that only matters when a specific phase fails enlarges the cognitive search space at the moment the agent is choosing the next phase. Solution: Push the per-phase failure-routing list and model-tier table into or the respective phase blocks so the top-of-file cognitive load is the phase sequence plus orchestration, not also a failure-routing index. |
| 🔵 Medium | Bloat Control | Problem: The new <workflow_phases> preamble packs a dense multi-clause skip-gate rule, a 'Per-phase failure cases — owned by phase files' pointer list, and model-tier definitions into the header before phase 0 even starts. The skip rule single bullet ('Skip gates: only with explicit user instruction, or when testgen-state.md marks ... otherwise resume from the earliest incomplete phase. The explicit user instruction skip NEVER applies to the Phase 3 / Phase 6 HITL gates — those are rule 2 of <orchestration_and_escalation> ...') restates precedence that <orchestration_and_escalation> already owns.Reason: The same HITL-never-overridden rule is stated in the preamble bullet and again in <orchestration_and_escalation> priority (2), adding redundancy the hardening reference flags as compressible without value loss. Solution: Move the full skip-gate conditional into <orchestration_and_escalation> (which already defines the priority hierarchy) and leave a one-line pointer in the preamble, to avoid restating the HITL-override precedence in two places. |
| 🔵 Medium | Example Grounding | Problem: The PR deletes the concrete 'Initial Prompt Formats' examples (Format 1/2/3 with literal Jira+Confluence URL samples) and the Confluence CQL example ( type=page AND space=PROJ AND text ~ 'feature'). NEW only keeps a single inline trigger example Analyze requirements for PROJ-123 and delegates the rest with 'input formats are enumerated in testgen-flow-project-config-loading.md step 0.1' and 'CQL search example ... the discovery skill'.Reason: BASE grounded the entry points with copy-pasteable examples; NEW relies on pointers, so grounding now depends entirely on the target files containing equivalent examples. Solution: Confirm the deleted CQL example and the three input-format samples actually exist verbatim in the cited Phase 0 step 0.1 / discovery skill; if any are absent there, the workflow lost grounded examples it used to carry. Keep at least the trigger example it retained. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 3 | ⬇️ Slightly worse |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/testgen-flow-test-case-generation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Example Grounding | Problem: The inline TestRail-compatible worked example and the concrete BEFORE/AFTER merged-case example were deleted in favor of a reference to the scenarios-generation FORMAT binding plus a generic vendor-neutral fallback <tc_schema>. The fallback path (used exactly when the skill/binding is unavailable) no longer demonstrates the parameterized merged-role example. Reason: When scenarios-generation is unavailable or returns an incompatible shape, merge behavior is less grounded than BASE, risking malformed test cases. Solution: Retain one minimal concrete merged-case example inline in the fallback <tc_schema> path so the agent keeps a grounded shape when the format binding cannot be loaded. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 3 | ⬇️ Slightly worse |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/testgen-flow-requirements-document-generation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Bloat Control | Problem: The new <failure_handling> ends with a 'Conscious tradeoff — why no inline per-entry fallback (declared once, not re-derived per turn)' block plus a closing paragraph explaining why the sibling testgen-flow-test-case-generation.md keeps a <tc_schema> fallback 'for a different reason'. This is rationale/justification meta-commentary (a 'Deployment guarantee' bullet citing the on-disk SKILL.md path, a 'Section contract is phase-owned' bullet) rather than operational instruction the agent must execute. Reason: pa-hardening / pa-patterns ai-issues require removing non-operational clarifications (rationale, origin labels, explanatory meta-notes); the tradeoff block restates a decision already enforced by the earlier rule and adds no executable behavior. Solution: Reduce to the single operational rule already stated earlier in the same block ('No inline per-entry fallback shape exists ... the phase blocks when the skill is unavailable; do NOT fabricate'). Drop the 'Conscious tradeoff' justification and the cross-sibling comparison, which are non-operational provenance/rationale notes. |
| 🔵 Medium | Cognitive Budget | Problem: The phase is short (4 steps) but the <create_requirements_document> 'Section contract' table plus the testgen additions plus the SMART exemplar plus the multi-paragraph <failure_handling> tradeoff make the failure/justification prose disproportionate to the actual 4-step procedure. Reason: Surface area grows from explanatory prose, not from procedure; pa-hardening targets compact phases where directives, not rationale, dominate. Solution: Trim the justification prose (see Bloat issue) so the executable procedure dominates the file's cognitive surface rather than the meta-rationale. |
| 🔵 Medium | Rosetta | Problem: The same block carries sibling-awareness meta-commentary: it names and reasons about another phase file ('the sibling testgen-flow-test-case-generation.md retains an inline <tc_schema> fallback for a different reason ...') and explains that sibling's internals. Reason: pa-hardening enforces no lateral/sibling awareness beyond keyword/frontmatter cues; explaining a sibling phase's design rationale exposes another phase's internals, which the boundary rules disallow. Solution: State this phase's own rule (skill is a hard dependency, block on failure) without describing or comparing against the sibling phase's fallback design. |
| ⚪ Low | Example Grounding | Problem: BASE carried concrete worked exemplars for US, FR, and NFR (e.g. 'FR-1: Password Validation ... Minimum 8 characters'); NEW keeps only one NFR SMART exemplar inline and delegates full US/FR/NFR worked examples to requirements-authoring/references/authoring-catalogs.md. Reason: Low severity because the single retained NFR exemplar is high quality and the catalogs reference was verified to exist; flagged only to confirm the deleted US/FR examples are covered downstream. Solution: Confirm authoring-catalogs.md actually contains US and FR worked examples equivalent to the deleted ones; if so this is acceptable delegation, otherwise restore a compact US/FR exemplar. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 3 | ⬇️ Slightly worse |
| Dependency Management | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/testgen-flow-test-case-export.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/testgen-flow-question-generation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/testgen-flow-gap-and-contradiction-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/testgen-flow-data-collection.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/testgen-flow-project-config-loading.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Cognitive Budget | Problem: Several added bullets pack multiple decisions into one dense sentence, e.g. the Phase-6 recommended-skills line " coding, testing (test-implementation is done inline by this phase via coding + testing)" and the Blocking-infeasibility bullet that chains four escape options inside a single bullet with · separators. The orchestrator must parse nested clauses to extract the actual branch.Reason: AI reliably handles ~5 atomic steps; multi-clause bullets raise the chance a branch is skipped. Solution: Decompose the longest combined bullets (blocking-infeasibility options, per-phase skill notes) into short sub-bullets so each carries one decision, per the prompt-authoring guidance to decompose directives and keep lines short. |
| 🔵 Medium | Reference Integrity | Problem: The new <orchestration_and_escalation> and <state_file> sections push the entire skip-refusal rule, the state-file template, and the ## Verification-Failure Overrides audit-trail row onto external owners: "its phase-execution loop owns the skip-without-agreement / falsified-skip refusal rule ... This workflow does NOT restate that logic" and "template owned by the data-collection phase, aqa-flow-data-collection.md". The workflow's correct behavior on a skipped phase is now entirely non-resolvable from this file; if orchestrator-contract or aqa-flow-data-collection.md does not define exactly that rule/row, the escalation contract silently breaks.Reason: Cross-file ownership with zero local fallback makes a safety-critical rule (refusing falsified phase skips) depend on a reference resolving correctly at runtime. Solution: Keep the delegation but add a one-line fallback assertion in this file (e.g. the minimal skip-refusal behavior and the required state rows) so the workflow degrades safely if the referenced owners drift, and confirm aqa-flow-data-collection.md actually defines <state_file_template> and ## Verification-Failure Overrides. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow-test-implementation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Structural Coherence | Problem: The file ends with a stray, unmatched </output> closing tag after the workflow root close tag (open=0, close=1 verified by grep). It has no matching <output> opener and is not part of the prompt's structure.Reason: The orphaned tag is injected verbatim into the agent's context when the workflow is loaded, adding confusing junk that can mislead XML-structure parsing; it is a systematic copy/paste artifact across 10 files in this PR. Solution: Delete the trailing </output> line so the document terminates at its root close tag. |
| 🔵 Medium | Example Grounding | Problem: The rewrite deleted all 10 concrete worked examples that grounded the abstract tasks (BASE Tasks 1-10 showed real test structure, setup, assertions, cleanup, e.g. test('should display correct welcome message after login', async ({ page }) => { ... } and the full Phase-6 test-plan markdown block). NEW keeps only one abstract state-file example (tests/e2e/checkout/refund.spec.ts). The actual authoring instruction "Author the test using page-object methods only (no raw selectors in test code), proper waits, project assertion style" now has no positive example of what compliant output looks like.Reason: The deleted examples satisfied the Example Grounding gate; removing them lowers grounding for the core authoring directive (per spec, deleted gate-satisfying content scores comparison<3). Solution: Re-add one short, framework-neutral positive example of a compliant test skeleton (page-object call + assertion + wait) so the authoring contract has a concrete anchor without re-introducing hardcoded Playwright. Keep it minimal to preserve the bloat win. |
| ⚪ Low | Reference Integrity | Problem: Behavior is delegated to skill modes that exist only as cross-references: " testing — UI impl mode" and "coding (standards-first mode)". The phase OWNS the contract but the actual authoring mechanics live in those skill modes; if testing/coding do not expose those named modes the phase cannot author anything.Reason: Named-mode references must resolve for the phase's USE SKILL steps to function. Solution: Confirm testing and coding SKILL.md define the referenced modes (UI impl mode / standards-first mode); if mode names are aspirational, soften to a capability description rather than a named mode. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 3 | ⬇️ Slightly worse |
| Example Grounding | 3 | ⬇️ Slightly worse |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow-code-analysis.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Reference Integrity | Problem: The BASE referenced the input path as agents/user-app/project_description.md; the NEW input_contract table relocates it to project_description.md (repo root) with no migration note, while the parent aqa-flow.md and sibling phases reference CONTEXT.md/ARCHITECTURE.md/IMPLEMENTATION.md as the authoritative docs. project_description.md is not a Rosetta predefined target file (per pa-rosetta.md the canonical docs are CONTEXT/ARCHITECTURE/IMPLEMENTATION). The contract leans on a non-canonical filename whose location silently changed.Reason: An input path that is non-canonical and silently relocated risks the GATE check ("project description OR one authoritative repo doc exists") passing/failing inconsistently across phases. Solution: Either justify project_description.md as an AQA-domain artifact and define where it is created, or fold its role into the canonical CONTEXT.md/ARCHITECTURE.md/IMPLEMENTATION.md already listed in the table, so the input contract uses resolvable Rosetta-canonical references. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow-selector-identification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow-test-correction.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Structural Coherence | Problem: The file ends with a stray, unmatched </output> closing tag after the workflow root close tag (open=0, close=1 verified by grep). It has no matching <output> opener and is not part of the prompt's structure.Reason: The orphaned tag is injected verbatim into the agent's context when the workflow is loaded, adding confusing junk that can mislead XML-structure parsing; it is a systematic copy/paste artifact across 10 files in this PR. Solution: Delete the trailing </output> line so the document terminates at its root close tag. |
| 🔵 Medium | Reference Integrity | Problem: The NEW input/in-scope contract references agents/plans/aqa-<test-name>-failure-analysis.md as the Phase 7 failure-analysis artifact (both in <workflow_context> and the <correction_contract> binding). The parent aqa-flow.md Phase 7 entry states the Phase 7 output only as "failure analysis with root causes and fix recommendations" without naming that exact path, and the BASE correction file read the analysis "from test plan". If Phase 7 (aqa-flow-test-report-analysis.md) does not write to exactly aqa-<test-name>-failure-analysis.md, step 8.1's coding binding (proposed-change source) points at a non-existent file.Reason: A cross-phase input path that only one side declares can break the Phase 8 apply step when the artifact is absent or named differently. Solution: Verify aqa-flow-test-report-analysis.md writes the failure analysis to the same agents/plans/aqa-<test-name>-failure-analysis.md path, or align both files to one agreed artifact name so the Phase 7 to Phase 8 handoff path is identical on both ends. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 3 | ⬇️ Slightly worse |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow-test-report-analysis.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Example Grounding | Problem: The NEW version deletes all concrete per-failure markdown templates and the categorized-failure example block that BASE carried (e.g. the BASE ### Failure: [Test Name] block with Error Type / Page Source File / Selector Used / Actual Element Structure fields, and the Performance Analysis template). NEW replaces them with an abstract field list in <failure_analysis_contract> ("Failure name / Error type / Root cause / Evidence label / Evidence rationale / Recommendation") and no filled example of an analyzed failure entry.Reason: The six-field contract is new and the Evidence label / rationale fields are error-prone; with no canonical example the agent must invent the shape, increasing inconsistency across failures despite the contract being machine-checked by the validation checklist. Solution: Add one short filled-in example failure entry (a selector error with a cited page-source line and an Evidence label) to ground the six-field contract, matching the worked-example style used in the sibling requirements-clarification phase. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 3 | ⬇️ Slightly worse |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow-selector-implementation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Structural Coherence | Problem: NEW contains a stray dangling </output> tag at line 86, after the closing </aqa_flow_selector_implementation> root tag. The prompt body is wrapped in <aqa_flow_selector_implementation>...</aqa_flow_selector_implementation> but there is no matching opening <output>, so the trailing </output> is an orphaned tag.Reason: An unmatched closing tag breaks the XML-style structural framing the rest of the AQA phases rely on; it can confuse tag-aware parsing/compaction and signals a copy-paste error, undermining the otherwise clean section boundaries. Solution: Delete the stray </output> line at the end of the file so the document terminates cleanly at the </aqa_flow_selector_implementation> close tag. |
| 🔵 Medium | Reference Integrity | Problem: The stray </output> tag at line 86 references a sectioning element (<output>) that is never opened anywhere in the file, so the reference does not resolve.Reason: A closing tag with no opener is a dangling reference within the prompt's own structure; even though it does not point to an external file, it is an unresolved structural token introduced by this diff. Solution: Remove the orphaned tag (same fix as the Structural Coherence issue); confirm no <output> open tag was intended. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 3 | ⬇️ Slightly worse |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow-data-collection.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: The added <workflow_context> vendor-resolution block enumerates three-or-four-deep config-key fallback chains twice in prose (e.g. "first non-empty key (stop at first hit): tms_mcp_collection_skill, tms_collection_skill, test_case_management.mcp_collection_skill" and the parallel documentation-vendor list documentation_mcp_collection_skill, documentation.mcp_collection_skill, mcp_documentation_collection_skill, confluence_mcp_collection_skill), and the same resolution is then restated again inside <gather_testrail> step 1 and <gather_confluence> <acquire_skills> step 1.Reason: The duplicated multi-key fallback prose inflates a Phase 1 collector phase and competes for attention with the actual collection steps, a redundancy the hardening reference flags (DRY / remove duplication). Solution: State each vendor's config-key fallback chain once in <workflow_context> and have the step bodies reference it by name (e.g. 'resolve TMS vendor binding per <workflow_context>') instead of repeating the in-scope signal description. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/aqa-flow-requirements-clarification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/workflows/qa-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: The workflow-phases and skip-rules blocks repeat the same ownership-attribution boilerplate many times — e.g. "is owned by the orchestrator-contract skill (per <references>)", "not restated here", "(Generic verify-before-advance is owned by orchestrator-contract.)", "Gate-execution mechanics ... are owned by USE SKILL hitl — defer to it; not restated here.". The same defer-to-skill clarification recurs in <phase_template>, <skip_rules>, the phase-output-gate bullet, and <failure_handling>.Reason: pa-hardening core_principles flag DRY and 'Avoid filler text / Remove non-operational clarifications'; the repeated parenthetical attributions add cognitive load without adding behavior. Solution: State the ownership boundary once (e.g., a single line: cadence + gate mechanics owned by orchestrator-contract/hitl) and drop the per-block re-statements; rely on the reader to carry it forward. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/qa-flow-project-config-loading.md
Error: Prompt too large for reliable evaluation: instructions/r2/core/workflows/qa-flow-project-config-loading.md
📄 instructions/r2/core/workflows/qa-flow-api-spec-analysis.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Workflow Completeness | Problem: Phase 2 (api-spec-analysis) auto-advances to Phase 3 with no HITL gate and is not marked type="HITL", unlike the analogous gated testgen/aqa data phases. The only guard is a weak 'file present, non-placeholder' check, so the agent can proceed on a thin/incorrect api-analysis.md. Reason: Without a confirmation gate the agent silently builds downstream test cases on unreviewed API analysis, breaking parity with the gated sibling flows and weakening HITL coverage. Solution: Either add a lightweight verify-before-advance confirmation after Phase 2 (and Phase 1), or document in qa-flow.md why API-spec extraction is intentionally trusted to auto-advance while sibling data phases are gated. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/qa-flow-test-case-specification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/qa-flow-test-correction.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Structural Coherence | Problem: The file ends with a stray, unmatched </output> closing tag after the workflow root close tag (open=0, close=1 verified by grep). It has no matching <output> opener and is not part of the prompt's structure.Reason: The orphaned tag is injected verbatim into the agent's context when the workflow is loaded, adding confusing junk that can mislead XML-structure parsing; it is a systematic copy/paste artifact across 10 files in this PR. Solution: Delete the trailing </output> line so the document terminates at its root close tag. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/qa-flow-gap-and-requirements-clarification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/qa-flow-data-collection.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/workflows/qa-flow-test-implementation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Structural Coherence | Problem: The file ends with a stray, unmatched </output> closing tag after the workflow root close tag (open=0, close=1 verified by grep). It has no matching <output> opener and is not part of the prompt's structure.Reason: The orphaned tag is injected verbatim into the agent's context when the workflow is loaded, adding confusing junk that can mislead XML-structure parsing; it is a systematic copy/paste artifact across 10 files in this PR. Solution: Delete the trailing </output> line so the document terminates at its root close tag. |
| 🔵 Medium | Rosetta | Problem:<stop_for_execution> embeds a full anti-bypass HITL policy directly in the phase: "User instruction to bypass this gate must be refused with citation of this rule... the gate is mechanical and cannot be overridden by instruction alone." pa-hardening states user involvement / HITL is canonically owned by bootstrap-hitl-questioning.md and a phase should point to the canonical HITL home via a type= marker, "never a parallel mechanism."Reason: A second, self-contained HITL mechanism risks drift from the canonical HITL authority and duplicates approval-governance logic the family already centralizes. Solution: Keep the STOP/WAIT gate but reference the canonical hitl skill for the refusal/override-vocabulary semantics instead of restating a self-contained bypass-refusal policy inside the phase, matching how the sibling Phase 6 and parent qa-flow delegate gate mechanics to hitl. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/workflows/qa-flow-execution-and-report-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/workflows/qa-flow-documentation-mcp-subflow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Decision Branching | Problem: Config advertises documentation_type values google-drive/local, but only the Confluence backend has a resolvable discovery binding and this subflow only maps 'Confluence backend -> binding confluence'. A google-drive/local config has no retrieval path and silently degrades to SKIPPED_NO_CONFIG while the user believes docs are wired. Reason: Silent degradation hides a misconfiguration: the agent skips documentation the user expected to be ingested, reducing requirement coverage without warning. Solution: Either constrain documentation_type's enum to backends that have a binding, or explicitly warn 'documentation_type has no retrieval binding — docs will be skipped' so the unsupported value is surfaced. |
| 🔵 Medium | Bloat Control | Problem: Config-key handling is stated twice: once narratively in <workflow_context> ("resolve the documentation vendor binding from whichever of these fields exists first") and again procedurally in <resolve> step 1 ("the first non-empty config key per <workflow_context> precedence list"), with the Confluence-backend mapping repeated in both places.Reason: Duplicated resolution logic across two sections can drift on edit and inflates the prompt without adding decision value. Solution: Keep the precedence list as the single source in <workflow_context> and let <resolve> reference it by pointer only (per pa-hardening SSoT rule: mark canonical home once, elsewhere a → pointer), removing the duplicated Confluence-mapping clause. |
| 🔵 Medium | Cognitive Budget | Problem: The second <workflow_context> bullet ("Config keys (read literally...)") packs vendor-binding precedence (four key names with stop-at-first-hit), the always-discovery mapping rule, and a long open-ended in-scope-signal enumeration (documentation_type, type, confluence_base_url, confluence_space, documentation_base_url, documentation_mcp_server, "or any field your qa-project-config template documents") into a single run-on bullet.Reason: Bundling multiple independent decision inputs into one dense sentence raises the per-step cognitive search space and risks the agent missing the stop-at-first-hit precedence or the absent-means-absent caveat. Solution: Split the resolution precedence list, the discovery-mapping rule, and the in-scope signal set into separate sub-bullets or a small table so each decision input is atomic; the <resolve> steps already reference them, so a structured form would not add length. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/skills/coding/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/debugging/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/requirements-authoring/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/requirements-authoring/references/authoring-catalogs.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 4 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/requirements-use/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/requirements-use/references/gap-analysis-catalogs.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/reverse-engineering/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/orchestrator-contract/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟠 Very High | Reference Integrity | Problem: The added <core_concepts> states "WORKFLOW LOADING is a separate canonical concern owned by load-workflow" and <resources> lists "skill load-workflow — canonical workflow loading", and <prerequisites> requires "OPERATION_MANAGER is active". No load-workflow or operation-manager skill exists anywhere under instructions/r2 (verified: not in instructions/r2/core/skills/ and no name: load-workflow/name: operation-manager frontmatter in r2). These are dangling canonical references.Reason: Per pa-rosetta.md, Rosetta prompts must reference prompts by logical name from the canonical docs/definitions/*.md lists and a missing name requires an explicit user question. An agent following the <prerequisites> gate ("OPERATION_MANAGER is active") or attempting USE SKILL load-workflow will fail to resolve them, breaking the dispatch/phase-drive chain.Solution: Point these references to skills that actually exist in r2 (e.g. the established plan-manager/planning for drive-loop concerns, or define and add the load-workflow/operation-manager skills to the canonical skills list before referencing them as authorities), or inline the loading/operation-manager responsibility instead of delegating to a non-existent owner. |
| 🟡 High | Rosetta | Problem: Same dangling load-workflow/operation-manager references violate the Rosetta definitions policy ('Use names from docs/definitions/*.md', 'Missing name: ask explicit user question', 'Do not auto-add out-of-list items').Reason: pa-rosetta.md mandates referencing only canonical Rosetta prompt names; inventing authority owners that do not exist degrades the Rosetta gate. Solution: Reconcile referenced skill names against the canonical Rosetta skills definitions before merge. |
| 🟡 High | Dependency Management | Problem:<prerequisites> adds a hard gate "OPERATION_MANAGER is active" and <core_concepts>/<resources> make the skill depend on load-workflow, but neither dependency is provided or resolvable in r2. The skill now cannot satisfy its own stated prerequisites.Reason: pa-hardening.md requires no gaps/ambiguity and logical consistency within a prompt and its DIRECT dependencies; an unresolvable hard prerequisite makes the contract impossible to honor. Solution: Either declare these as optional/soft dependencies with a fallback when the owning skill is absent, or wire them to existing canonical skills so the dependency graph closes. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Reference Integrity | 2 | ⬇️ Slightly worse |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 3 | ⬇️ Slightly worse |
| Rosetta | 3 | ⬇️ Slightly worse |
📄 instructions/r2/core/skills/orchestrator-contract/references/dispatch-template.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Rosetta | Problem: The template hard-codes MUST USE SKILL ... operation-manager into every dispatch prompt, but the operation-manager skill does not exist anywhere under instructions/r2 (it exists only in r3). Every subagent spawned via this template is told to load a missing skill.Reason: pa-rosetta requires references to resolve within the release. A baked-in missing-skill load makes every r2 subagent dispatch start with a failed ACQUIRE, undermining the dispatch chain. Solution: Remove operation-manager from the r2 dispatch template (or add the skill to r2). Keep references limited to skills that exist in r2. |
| 🟡 High | Reference Integrity | Problem: The new template hard-codes "MUST USE SKILL subagent-contract, operation-manager." into every dispatch prompt, but operation-manager is not a skill that exists in instructions/r2 (only subagent-contract resolves). Every subagent dispatched with this template will be told to load a non-existent skill.Reason: pa-rosetta.md requires referencing only canonical Rosetta prompt names; a MUST directive to load a missing skill propagates a broken instruction to every spawned subagent. Solution: Reference only resolvable canonical skills (subagent-contract) and remove or replace operation-manager with the actual skill name once it exists in the canonical skills list. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/scenarios-generation/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/skills/scenarios-generation/references/gwt-spec.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/skills/scenarios-generation/references/testrail-export.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Dependency Management | Problem: The vendor MCP tool names are hardcoded as bare identifiers throughout the process, e.g. call \\mcp_testrail_get_project(project_id)\`` (step 1) and mcp_testrail_add_case(section_id, title, priority_id, type_id, refs, custom_steps_separated) (step 8). pa-rosetta.md requires Rosetta prompts to be coding-agent-agnostic and pa-hardening.md says no hardcoded tool names; here the concrete TestRail MCP signatures are baked into the binding file.Reason: Vendor names live in a config-resolved binding file by design and the fork table makes them swappable, so this is a minor, intentional containment rather than a portability regression. Solution: This is acceptable as the lowest layer (a vendor-specific binding explicitly named testrail-export.md whose whole purpose is to hold the TestRail specifics, and the SKILL keeps the vendor abstraction), and the 'Swapping to another TMS vendor' table parameterizes every tool name for forks. Keep, but ensure the SKILL/PHASE never reaches these names except through the resolved EXPORT binding, so vendor-agnosticism is preserved at the skill boundary. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/skills/scenarios-generation/references/testrail-format.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/skills/testing/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/testing/references/implementation-examples.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Reference Integrity | Problem: Load-bearing rules (ATC traceability, assertion-priority, exact UI/selector/API code shapes) now live ONLY in this lazy-loaded reference, and the aqa/qa phase files that consume the testing skill do not name implementation-examples.md directly (chain: phase -> testing SKILL.md -> reference). Reason: An agent that under-loads the lazy reference loses the exact output shape and the fragile-selector approval rule, producing ungrounded test code. Solution: Keep the load instruction imperative and restate the 1-2 truly load-bearing invariants (ATC traceability, no-silent-fragile-selector) as a short inline guard in testing/SKILL.md so they survive if the reference is not loaded. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r2/core/skills/discovery/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r2/core/skills/discovery/references/jira-binding.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Structural Coherence | Problem: The file ends with a dangling, unmatched closing XML tag </content> (last line) with no corresponding opening <content> tag anywhere in the file; the file is otherwise pure markdown (# headers, tables, fenced blocks).Reason: An orphan closing tag is a structural artifact that gets loaded verbatim into agent context on ACQUIRE; it can confuse XML-aware parsing and signals a copy/paste leftover, undermining the clean section structure the binding otherwise has. Solution: Remove the stray closing </content> line so the file is consistently markdown with no orphan XML tag. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/discovery/references/confluence-binding.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Structural Coherence | Problem: The file ends with a dangling, unmatched closing XML tag </content> (last line) with no opening <content> tag; the body is otherwise pure markdown.Reason: The stray tag is loaded into agent context verbatim on ACQUIRE and is a copy/paste leftover that breaks the otherwise-clean markdown structure and may mislead XML-aware parsing. Solution: Delete the orphan </content> final line. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r2/core/skills/discovery/references/testrail-binding.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Structural Coherence | Problem: The file ends with a dangling, unmatched closing XML tag </content> (last line) with no opening <content> tag; the file is otherwise pure markdown.Reason: The orphan tag is ACQUIRE'd into agent context as-is, is a copy/paste leftover, and breaks the clean markdown structure the binding otherwise maintains. Solution: Remove the orphan </content> final line. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ✅ Much better |
| Single Responsibility | 5 | ✅ Much better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/testgen-flow.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 4 | ✅ Much better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/testgen-flow-test-case-generation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Example Grounding | Problem: The inline TestRail-compatible worked example and the concrete BEFORE/AFTER merged-case example were deleted in favor of a reference to the scenarios-generation FORMAT binding plus a generic vendor-neutral fallback <tc_schema>. The fallback path (used exactly when the skill/binding is unavailable) no longer demonstrates the parameterized merged-role example. Reason: When scenarios-generation is unavailable or returns an incompatible shape, merge behavior is less grounded than BASE, risking malformed test cases. Solution: Retain one minimal concrete merged-case example inline in the fallback <tc_schema> path so the agent keeps a grounded shape when the format binding cannot be loaded. |
| 🔵 Medium | Reference Integrity | Problem: Step 5.3 (added) instructs loading references/<vendor>-format.md via the resolved vendor binding, but the scenarios-generation skill ships only testrail-format.md / testrail-export.md. For any resolved vendor other than testrail, the referenced <vendor>-format.md does not exist, so the ACQUIRE would return zero documents.Reason: A reference that resolves only for one vendor while the prompt implies an open vendor set is a latent dangling-reference; the inline <tc_schema> fallback mitigates breakage but the path is still mis-advertised.Solution: Note that only the testrail vendor binding currently has reference assets, or constrain the resolvable vendor set to those with shipped reference files, so the parameterized reference always resolves. |
| 🔵 Medium | Precision & Explicitness | Problem: After the refactor parameterizes the test format to a config-resolved vendor binding ( scenarios-generation with references/<vendor>-format.md, the resolved FORMAT-binding case format), several changed lines still hardcode the term "TestRail format": <phase_steps> line 3 Generate test cases in TestRail format and the <create_test_document> placeholder [TC entries in the resolved FORMAT-binding case format] coexisting with the residual title. One concept (the case format) is now named two ways, weakening the one-term-per-concept discipline the rest of the file establishes.Reason: Mixed naming for the same concept can make an agent treat 'TestRail' as a hard requirement even when the config resolves a different TMS vendor, contradicting the file's own config-resolution rule. Solution: Make the format term consistent with the vendor-binding abstraction the file otherwise uses (refer to the resolved FORMAT binding rather than naming TestRail) in the <phase_steps> step-3 line so the parameterization reads uniformly. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 3 | ⬇️ Slightly worse |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ✅ Much better |
| Cognitive Budget | 4 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/testgen-flow-requirements-document-generation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: The added <failure_handling> block carries a multi-paragraph meta-justification section, Conscious tradeoff — why no inline per-entry fallback (declared once, not re-derived per turn): with three bulleted sub-points (skill-is-hard-dependency, deployment guarantee, section-contract-is-phase-owned) plus a closing paragraph contrasting the sibling testgen-flow-test-case-generation.md. This is rationale/provenance prose explaining why a design decision was made rather than an operational instruction the agent must execute, which pa-hardening flags as removable ('Remove non-operational clarifications (history, rationale, ...), provenance, or explanatory meta-notes').Reason: Per-turn re-sent rationale consumes cognitive budget and context window without altering execution, and the same operational outcome is already stated in the 'Skill execution failure' bullet above it. Solution: Reduce the tradeoff explanation to the one operational rule the agent needs (skill failure blocks the phase; no inline fallback exists; re-invoke once then halt) and drop the design-rationale paragraphs that do not change agent behavior. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 3 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/testgen-flow-test-case-export.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ✅ Much better |
| Cognitive Budget | 4 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-question-generation.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/testgen-flow-gap-and-contradiction-analysis.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Example Grounding | Problem: The BASE file carried concrete detection examples inline (e.g. value-mismatch: 'Priority: Jira says "High", Confluence says "Low priority"'; logic-conflict: '"Must be fast" AND "Must show detailed calculations"'). The NEW file deletes all of these and delegates them to the requirements-use gap_analysis mode's catalogs, keeping only one vague-vs-specific example row in the document-contract. A reader of this phase alone now sees the taxonomy names without grounded probes.Reason: The lost examples are recoverable via the resolvable requirements-use reference, so this is a minor standalone-readability loss, not a behavioral regression.Solution: Acceptable as delegation since requirements-use/references/gap-analysis-catalogs.md resolves and owns the catalogs; if any standalone usability is desired, keep one short illustrative probe per category as an inline pointer to the catalog. No prompt rewrite needed if delegation is the intended boundary. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/testgen-flow-data-collection.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Example Grounding | Problem: BASE contained concrete procedural detail now removed and delegated: the CQL template ( type=page AND space={PROJECT_KEY} AND (text ~ "{term1}" ...)), pseudo-call signatures, and the explicit child-page traversal loop. NEW delegates all of this to discovery's confluence-binding.md ('owns URL parsing, direct-URL-vs-search precedence, child-page traversal...'). A reader of this phase alone no longer sees the search/traversal mechanics.Reason: The deleted procedural examples are recoverable via the resolvable discovery confluence/jira bindings, and the hardcoded mcp_Jira_MCP_* names were intentionally removed to satisfy Dependency Management (config-resolved vendors), which is a net improvement.Solution: Acceptable delegation: discovery/references/confluence-binding.md and jira-binding.md both resolve, and <pitfalls> still names child-page traversal as a MUST. No rewrite needed; the mechanics correctly live in the binding the phase invokes. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/testgen-flow-project-config-loading.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Reference Integrity | Problem: The <state_file_template> ## Phase Details example shows only ### Phase 1 (with 'Add sections for each completed phase' hook) even though this is Phase 0 and the validation_checklist requires 'testgen-state.md created with Phase 0 marked complete'. The canonical template the file itself authors illustrates a downstream phase rather than the Phase 0 detail row it must write here, a small self-consistency gap in the owned template.Reason: The template otherwise resolves correctly and is referenced by sibling phases; the example-vs-required-output mismatch is cosmetic and unlikely to break execution, hence low severity. Solution: Show a ### Phase 0 example detail row in the <state_file_template> ## Phase Details block (or relabel the placeholder generically) so the template demonstrates the row this phase actually appends. No behavioral logic change required. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ✅ Much better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-test-implementation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Structural Coherence | Problem: The file ends with a stray, unmatched </output> closing tag after the workflow root close tag (open=0, close=1 verified by grep). It has no matching <output> opener and is not part of the prompt's structure.Reason: The orphaned tag is injected verbatim into the agent's context when the workflow is loaded, adding confusing junk that can mislead XML-structure parsing; it is a systematic copy/paste artifact across 10 files in this PR. Solution: Delete the trailing </output> line so the document terminates at its root close tag. |
| 🔵 Medium | Example Grounding | Problem: The rewrite deletes ALL concrete code examples that grounded the abstract instructions in BASE (e.g. the full test('should display correct welcome message after login', async ({ page }) => {...}) setup/action/assertion snippets and the import-structure template). NEW gives one state-file markdown example but zero example of an authored test or of a ### Uncovered Assertions entry, even though it mandates 'every Phase 2 assertion implemented OR recorded'.Reason: Example Grounding gate requires abstract instructions be grounded with concrete examples; the contract rules ('no raw selectors', 'Uncovered Assertions reason format') are now stated only abstractly. This is an intentional portability/bloat trade-off, hence net-positive elsewhere, but the grounding gate specifically regressed. Solution: Add one small, framework-neutral worked example of an Uncovered-Assertions entry and/or a minimal page-object-method-based test snippet illustrating the 'no raw selectors in test code' rule, without re-baking Playwright specifics. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 3 | ⬇️ Slightly worse |
| Example Grounding | 3 | ⬇️ Slightly worse |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/aqa-flow-code-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/aqa-flow-selector-identification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/aqa-flow-test-correction.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Structural Coherence | Problem: The file ends with a stray, unmatched </output> closing tag after the workflow root close tag (open=0, close=1 verified by grep). It has no matching <output> opener and is not part of the prompt's structure.Reason: The orphaned tag is injected verbatim into the agent's context when the workflow is loaded, adding confusing junk that can mislead XML-structure parsing; it is a systematic copy/paste artifact across 10 files in this PR. Solution: Delete the trailing </output> line so the document terminates at its root close tag. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 3 | ⬇️ Slightly worse |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ✅ Much better |
📄 instructions/r3/core/workflows/aqa-flow-test-report-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-selector-implementation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Structural Coherence | Problem: The PR introduces a stray, unmatched closing tag </output> as the final line of the file (line 86), AFTER the root element close </aqa_flow_selector_implementation> (line 84). The tag has no opening counterpart anywhere in the file. It is NOT present in the BASE version and is unique to this file among the four (the other three sibling phase files have no </output> tag).Reason: A dangling XML close tag with no opener corrupts the structural integrity of the prompt. When the phase is ACQUIRE'd and injected into an agent's context, the orphan tag can confuse XML-aware parsing, mislead the agent about where the phase body ends, and is exactly the kind of unmatched-tag defect the audit was told to look for at the end of this file. Solution: Delete the trailing </output> line so the file ends cleanly at the root close </aqa_flow_selector_implementation>. Most likely a transcription/paste artifact from a tool-output wrapper that leaked into the saved file. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 3 | ⬇️ Slightly worse |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-data-collection.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ✅ Much better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ✅ Much better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/aqa-flow-requirements-clarification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ✅ Much better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ✅ Much better |
| Example Grounding | 5 | ✅ Much better |
| Safety Boundaries | 4 | ⬆️ Slightly better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: Several blocks carry meta-narration about ownership boundaries rather than operational instruction, e.g. (Source-system + tool enumeration owned by the frontmatter \\description\ field — not restated here.) and The ACQUIRE / execute / state-update cadence is the \\orchestrator-contract\ skill's contract, not restated per-phase. These provenance/ownership annotations repeat across <workflow_phases>, <phase_template>, <skip_rules>, and <state_file>.Reason: pa-hardening.md flags non-operational clarifications and provenance/meta-notes for removal; the repeated ownership prose inflates the prompt without changing agent behavior. Solution: Compress the repeated 'owned by X, not restated here' disclaimers into a single one-line ownership note at the top of the file instead of repeating the pattern in each block. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-project-config-loading.md
Error: Prompt too large for reliable evaluation: instructions/r3/core/workflows/qa-flow-project-config-loading.md
📄 instructions/r3/core/workflows/qa-flow-api-spec-analysis.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Workflow Completeness | Problem: Phase 2 (api-spec-analysis) auto-advances to Phase 3 with no HITL gate and is not marked type="HITL", unlike the analogous gated testgen/aqa data phases. The only guard is a weak 'file present, non-placeholder' check, so the agent can proceed on a thin/incorrect api-analysis.md. Reason: Without a confirmation gate the agent silently builds downstream test cases on unreviewed API analysis, breaking parity with the gated sibling flows and weakening HITL coverage. Solution: Either add a lightweight verify-before-advance confirmation after Phase 2 (and Phase 1), or document in qa-flow.md why API-spec extraction is intentionally trusted to auto-advance while sibling data phases are gated. |
| 🔵 Medium | Bloat Control | Problem: The redaction guidance is largely restated in two places: <redaction_contract> defines targets and a re-scan grep list, and <validation_checklist> re-asserts Redaction scan ran per <redaction_contract> while the per-endpoint Notes / Discrepancies field also says 'record each applied redaction here.' The same redaction obligation is expressed three times.Reason: pa-hardening.md DRY/compress guidance: repeating the same obligation across contract, template, and checklist adds words without adding control. Solution: State the redaction obligation once in <redaction_contract> and have the checklist and template fields reference it by name without re-describing what to record. |
| 🔵 Medium | Cognitive Budget | Problem: The phase carries two large inline worked examples in one file — the full <endpoint_contract_template> blank template AND a complete 'Worked entry' (GET /api/v1/orders/{orderId} with 4 response rows, citations, discrepancies) — plus the <redaction_contract> with a full re-scan grep list. At 14.4K chars this single phase loads a heavy template-plus-example surface for the discoverer subagent to hold while extracting contracts.Reason: pa-hardening.md sets a <300-line ideal / 500 acceptable size target and warns AI feels overloaded past ~5 directives; duplicating a full template as both blank and filled doubles the cognitive surface. Solution: Keep the blank <endpoint_contract_template> and trim the worked entry to the minimal discriminating fields (one response row + the discrepancy note), since the discrepancy is the only thing the example uniquely teaches. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-test-case-specification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-test-correction.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🟡 High | Structural Coherence | Problem: The file ends with a stray, unmatched </output> closing tag after the workflow root close tag (open=0, close=1 verified by grep). It has no matching <output> opener and is not part of the prompt's structure.Reason: The orphaned tag is injected verbatim into the agent's context when the workflow is loaded, adding confusing junk that can mislead XML-structure parsing; it is a systematic copy/paste artifact across 10 files in this PR. Solution: Delete the trailing </output> line so the document terminates at its root close tag. |
| 🔵 Medium | Bloat Control | Problem: The iteration-cap + escalation rule is stated three times: <correction_contract> ('cap in-phase apply retries at 3 cycles per failing change... record Phase 7 blocked: in-phase apply retry cap reached'), <present_for_approval> step 3a (re-prompt cap), and <apply_changes> step 4 ('Max retries: cap step 7.3 in-phase retries at 3 cycles... record Phase 7 blocked: in-phase apply retry cap reached'). Same cap, threshold, and state string duplicated verbatim.Reason: pa-hardening.md DRY/compress: the identical cap and Phase 7 blocked string appearing in three blocks is redundancy that can drift if one copy is edited.Solution: Define the 3-cycle apply-retry cap and its exact state-note string once in <correction_contract> and reference it from <apply_changes> step 4 instead of restating it. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-gap-and-requirements-clarification.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-data-collection.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Bloat Control | Problem: The 'phase OWNS the contract, skills EMIT into it' ownership statement is restated three times across the file: <workflow_context> ('This phase OWNS the raw-data aggregation contract... EMIT into the sections this phase asserts'), <raw_data_contract> ('discovery and reverse-engineering emit into these, they do not define them'), and <phase_steps>. The emit/own framing repeats without adding new control.Reason: pa-hardening.md DRY/compress and 'remove non-operational clarifications': the repeated ownership meta-framing is provenance prose, not an actionable directive. Solution: State the own/emit relationship once in <raw_data_contract> and drop the duplicated framing from <workflow_context> and <phase_steps>. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Cognitive Budget | 4 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/workflows/qa-flow-test-implementation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Structural Coherence | Problem: The file ends with a stray, unmatched </output> closing tag after the workflow root close tag (open=0, close=1 verified by grep). It has no matching <output> opener and is not part of the prompt's structure.Reason: The orphaned tag is injected verbatim into the agent's context when the workflow is loaded, adding confusing junk that can mislead XML-structure parsing; it is a systematic copy/paste artifact across 10 files in this PR. Solution: Delete the trailing </output> line so the document terminates at its root close tag. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/workflows/qa-flow-execution-and-report-analysis.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/workflows/qa-flow-documentation-mcp-subflow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Decision Branching | Problem: Config advertises documentation_type values google-drive/local, but only the Confluence backend has a resolvable discovery binding and this subflow only maps 'Confluence backend -> binding confluence'. A google-drive/local config has no retrieval path and silently degrades to SKIPPED_NO_CONFIG while the user believes docs are wired. Reason: Silent degradation hides a misconfiguration: the agent skips documentation the user expected to be ingested, reducing requirement coverage without warning. Solution: Either constrain documentation_type's enum to backends that have a binding, or explicitly warn 'documentation_type has no retrieval binding — docs will be skipped' so the unsupported value is surfaced. |
| 🔵 Medium | Bloat Control | Problem: The single <workflow_context> Config keys bullet (line 17) packs vendor-binding precedence (4 keys), the discovery-skill mapping, AND a 7+ item in-scope-signal enumeration into one dense run-on sentence spanning ~7 lines.Reason: pa-hardening targets short phrases and progressive layering; a single multi-clause bullet raises cognitive load and obscures the two distinct decisions (which vendor vs is-it-in-scope). Solution: Split the config-key resolution from the in-scope-signal detection into two short bullets/sub-lists so each carries one decision; no content need be lost. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/coding/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/debugging/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/coding-agents-prompt-authoring/references/pa-hardening.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ⬆️ Slightly better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/operation-manager/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Dependency Management | Problem: Line 10 changes the frontmatter to model: claude-sonnet-4-6, gpt-5.5, gemini-3.1-pro — a comma-joined list of vendor model ids in a field the skill schema treats as a single model id, and these literal ids are baked into the prompt rather than parameterized.Reason: pa-rosetta requires agent-agnostic prompts and pa-hardening forbids hardcoded tool/vendor names; pinning three specific vendor model ids in frontmatter risks contract breakage and reduces portability, even though the intent (broaden model support) is sound. Solution: Confirm the skill-schema model: field accepts a list; if it expects a scalar, express multi-agent support another way (e.g., a documented capability note) rather than a comma list of hardcoded vendor model ids. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Output Contract | 5 | ✅ Much better |
| Conflict Resolution | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ✅ Much better |
| Dependency Management | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/requirements-authoring/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 4 | ⬆️ Slightly better |
| Instruction Ordering | 4 | ⬆️ Slightly better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 4 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 4 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 4 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ✅ Much better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/requirements-authoring/references/authoring-catalogs.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/requirements-use/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Input Contract | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 4 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/requirements-use/references/gap-analysis-catalogs.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/reverse-engineering/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Conflict Resolution | 4 | ⬆️ Slightly better |
| Decision Branching | 5 | ✅ Much better |
| Workflow Completeness | 4 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Example Grounding | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 5 | ✅ Much better |
| Self-Validation | 4 | ⬆️ Slightly better |
| Bloat Control | 5 | ✅ Much better |
| Cognitive Budget | 5 | ⬆️ Slightly better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/orchestrator-contract/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Single Responsibility | 4 | ⬆️ Slightly better |
| Input Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ✅ Much better |
| Conflict Resolution | 5 | ✅ Much better |
| Decision Branching | 5 | ✅ Much better |
| Workflow Completeness | 5 | ✅ Much better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ✅ Much better |
| Failure Handling | 5 | ✅ Much better |
| Epistemic Honesty | 4 | ⬆️ Slightly better |
| Self-Validation | 5 | ✅ Much better |
| Bloat Control | 4 | ⬆️ Slightly better |
| Cognitive Budget | 5 | ✅ Much better |
| Dependency Management | 5 | ⬆️ Slightly better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/orchestrator-contract/references/dispatch-template.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/scenarios-generation/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/scenarios-generation/references/gwt-spec.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/scenarios-generation/references/testrail-export.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/scenarios-generation/references/testrail-format.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/testing/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Goal Specification | 5 | ⬆️ Slightly better |
| Input Contract | 5 | ⬆️ Slightly better |
| Output Contract | 4 | ⬆️ Slightly better |
| Success Criteria | 5 | ⬆️ Slightly better |
| Decision Branching | 5 | ⬆️ Slightly better |
| Workflow Completeness | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Reference Integrity | 5 | ⬆️ Slightly better |
| Example Grounding | 4 | ⬆️ Slightly better |
| Safety Boundaries | 5 | ⬆️ Slightly better |
| Failure Handling | 5 | ⬆️ Slightly better |
| Epistemic Honesty | 5 | ⬆️ Slightly better |
| Self-Validation | 5 | ⬆️ Slightly better |
| Rosetta | 5 | ⬆️ Slightly better |
📄 instructions/r3/core/skills/testing/references/implementation-examples.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Reference Integrity | Problem: Load-bearing rules (ATC traceability, assertion-priority, exact UI/selector/API code shapes) now live ONLY in this lazy-loaded reference, and the aqa/qa phase files that consume the testing skill do not name implementation-examples.md directly (chain: phase -> testing SKILL.md -> reference). Reason: An agent that under-loads the lazy reference loses the exact output shape and the fragile-selector approval rule, producing ungrounded test code. Solution: Keep the load instruction imperative and restate the 1-2 truly load-bearing invariants (ATC traceability, no-silent-fragile-selector) as a short inline guard in testing/SKILL.md so they survive if the reference is not loaded. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/discovery/SKILL.md
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/discovery/references/confluence-binding.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Dependency Management | Problem: MCP call names are hardcoded inline ( confluence_get_page, confluence_get_page_children, confluence_search, and write-guard names confluence_create_page/confluence_update_page/confluence_add_comment) rather than parameterized, against the Rosetta agent-agnostic / no-hardcoded-tool-names principle.Reason: pa-hardening and pa-rosetta require coding-agent-agnostic prompts with no baked tool names; a binding that names exact MCP functions ties the skill to one specific MCP server implementation. Solution: Keep the literal names as illustrative call shapes (acceptable for a vendor binding) but frame them as the expected MCP capability per the resolved binding rather than as the only valid tool identifiers, so a differently-named Confluence MCP still resolves. |
| 🔵 Medium | Structural Coherence | Problem: The file ends with a stray, unmatched closing tag </content> on its last line, with no opening <content> anywhere in the document. The file is otherwise plain markdown (it opens with # Vendor binding: Confluence), so this dangling XML tag is a copy/paste or template-extraction artifact.Reason: When this reference is lazy-loaded into agent context the literal </content> renders as visible junk text and can confuse XML-aware parsing of the surrounding skill, signalling a malformed asset.Solution: Delete the trailing </content> line so the markdown reference ends cleanly on its last validation bullet. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/discovery/references/jira-binding.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Dependency Management | Problem: Exact MCP tool identifiers are baked in ( jira_get_issue, jira_search_fields, and write-guard names jira_create_issue/jira_update_issue/jira_transition_issue/jira_add_comment) rather than parameterized capabilities, against the no-hardcoded-tool-names principle.Reason: pa-rosetta/pa-hardening require agent-agnostic prompts; hardcoded function names couple the binding to one MCP server's exact API. Solution: Present the names as the expected Jira MCP call shapes for the resolved binding rather than as the sole valid identifiers, so an alternately-named Jira MCP still maps. |
| 🔵 Medium | Structural Coherence | Problem: The file ends with a stray, unmatched </content> closing tag on its last line; there is no opening <content> tag anywhere and the document is plain markdown opening with # Vendor binding: Jira. It is a leftover extraction/template artifact.Reason: The dangling tag is emitted verbatim into agent context when the binding is lazy-loaded, producing junk output and a malformed-asset signal. Solution: Remove the trailing </content> line so the file ends on its last Read-only validation bullet. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/discovery/references/testrail-binding.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| 🔵 Medium | Dependency Management | Problem: MCP identifiers are hardcoded ( get_case, get_case_fields, and write-guard names update_case/add_case/delete_case) rather than parameterized, against the agent-agnostic / no-hardcoded-tool-names principle.Reason: pa-rosetta/pa-hardening require coding-agent-agnostic prompts; exact tool names couple the binding to one MCP implementation. Solution: Frame the names as the expected TestRail MCP call shapes for the resolved binding rather than the only valid identifiers. |
| 🔵 Medium | Structural Coherence | Problem: The file ends with a stray, unmatched </content> closing tag on its last line; no opening <content> exists and the document is plain markdown opening with # Vendor binding: TestRail. It is a leftover extraction/template artifact (identical defect to the Jira and Confluence bindings).Reason: The dangling tag is rendered verbatim when the binding is lazy-loaded into context and signals a malformed asset; the repeated occurrence across all three bindings confirms a systematic copy/extraction error. Solution: Delete the trailing </content> line so the file ends on its last Read-only validation bullet. |
📊 Gates Comparison
| Gate | Score | Comparison |
|---|
📄 instructions/r3/core/skills/requirements-authoring/assets/ra-requirement-unit.xml
✅ No Issues Found
📊 Gates Comparison
| Gate | Score | Comparison |
|---|---|---|
| Output Contract | 5 | ⬆️ Slightly better |
| Precision & Explicitness | 5 | ⬆️ Slightly better |
| Structural Coherence | 5 | ⬆️ Slightly better |
| Rosetta | 5 | ⬆️ Slightly better |
QA, AQA, Testgen workflows tested and bugfixed/hardenen in several places. Some common parts of them, along with the skills created previously by Maksym, are merged into 4 pre-existing skills and 2 new skills foreseen by
docs/definitions/skills.md.(The previous problem with multiple scattered skills comes from the fact that
docs/definitions/skills.md, along with the idea of prototypic pre-defined skills, is not mentioned anywhere in docs and instructions. The only mention I found is located inside of a reference file of a skill I didn't touch within this PR.)Also I added instructions on how to test the workflows:
docs/manual-tests.❗️ A 'Cognitive Budget' problem I don't know how to solve (
qa-flow-project-config-loadingandaqa-flow-data-collection): the files are too long, and the correct fix would be to move templates to a reference file or an asset file, but workflow phase files do not have anything similar to references/assets.