Wire tier-1 typed-event conformance fixtures by chris-colinsky · Pull Request #184 · LunarCommand/openarmature-python

chris-colinsky · 2026-06-24T02:36:59Z

Summary

First tier of the conformance-harness fixture catch-up: wire the typed
LLM-event observability fixtures that python currently satisfies through its own
unit tests into the YAML conformance harness, so the harness runs the spec's own
fixtures rather than python-authored equivalents. Test-only, no library change,
no pin bump; this strengthens the cross-impl conformance signal.

Wired (10)

Moved from _UNIT_TESTED_FIXTURES to _SUPPORTED_FIXTURES:

060-065, 068: LlmCompletionEvent field population (input messages, output
content, request params and extras, active prompt plus the null case,
response model)
067, 071: call_id distinctness across a multi-node chain
072: LlmFailedEvent / LlmCompletionEvent mutual exclusion

Harness machinery added

A multi-node-chain runner (_build_chain_llm_graph) for the chained-call
fixtures, alongside the existing single-node typed-event runner.
New assertion shapes: fields_absent_keys, the event_counts list form, the
non-empty <any-string> matcher, and a recursive value matcher so
active_prompt (a PromptResult) compares against the fixture's identity
mapping.
renders_prompt active-prompt binding, request-model binding from
calls_llm.model, and RuntimeConfig construction from calls_llm.config.
expected_error now asserts both the error category and the originating node;
the call_id presence/distinctness invariants for 067/071 are machine-checked.
Consolidated the mock-transport and RuntimeConfig construction into shared
helpers.

Deferred (4)

Four fixtures stay in _UNIT_TESTED_FIXTURES with documented reasons, each
blocked on an upstream spec fixture change to be picked up at the next pin bump.
The behavior is already covered by the unit suite, and the coverage guard keeps
the accounting honest:

066: the prompt group has a single member at the current pin; python's
PromptGroup requires at least two. The corrected fixture is upstream.
069: asserts a request model the fixture does not declare.
070: the malformed tool message (no tool_call_id) is non-constructible here,
since ToolMessage.tool_call_id is required and validated at construction
rather than the call boundary.
073: asserts the vendor body error.type verbatim, where python surfaces the
exception class name (a contract-permitted style).

Testing

tests/conformance/test_observability.py: 64 passed, 48 skipped.
Full tests/: 1456 passed, 414 skipped.
ruff and pyright clean.

Move 10 LlmCompletionEvent/LlmFailedEvent fixtures (060-065, 067, 068, 071, 072) from _UNIT_TESTED_FIXTURES into _SUPPORTED_FIXTURES so the conformance harness runs the spec's own YAML fixtures rather than python's hand-written equivalents. First tier of the fixture-harness catch-up; strengthens the cross-impl conformance signal with no library change. Four of the family stay unit-tested, each blocked on a spec-side fixture change to be picked up at the next pin bump: 066 (single-member prompt group, corrected upstream), 069 (asserts an undeclared request model), 070 (missing tool_call_id is non-constructible here, validated at construction not the call boundary), 073 (asserts the vendor error.type where python surfaces the exception class name).

Copilot

Pull request overview

Wires tier-1 typed LLM-event observability fixtures (proposal 0057/0058 family) into the YAML conformance harness so the Python implementation is validated against the spec-owned fixtures rather than Python-authored unit-test equivalents.

Changes:

Moves a set of typed-event fixtures (060–065, 067, 068, 071, 072) into _SUPPORTED_FIXTURES and routes them through the conformance runner.
Adds new harness machinery: multi-node chain runner, expanded assertion shapes (event_counts, fields_absent_keys, <any-string>), and recursive value matching for structured fields like active_prompt.
Consolidates shared helpers for mock transport + RuntimeConfig construction and strengthens failure-path assertions (expected_error, call_id invariants).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Address review feedback on the tier-1 fixture wiring: - _render_prompt_result derives rendered_hash via the canonical compute_rendered_hash(messages) helper instead of a bespoke truncated SHA, dropping the hashlib import. - _build_runtime_config is annotated RuntimeConfig | None (via a TYPE_CHECKING import) instead of Any, restoring type information at the provider.complete call sites. - _materialize_typed_messages asserts system/user content is a present, non-empty string instead of coercing to empty, so a fixture mistake fails on the real field rather than as a downstream model ValueError.

Copilot AI review requested due to automatic review settings June 24, 2026 02:37

Copilot started reviewing on behalf of chris-colinsky June 24, 2026 02:37 View session

Copilot AI reviewed Jun 24, 2026

View reviewed changes

Comment thread tests/conformance/test_observability.py

Comment thread tests/conformance/test_observability.py Outdated

Comment thread tests/conformance/test_observability.py Outdated

Comment thread tests/conformance/test_observability.py

chris-colinsky merged commit ec73dc5 into main Jun 24, 2026
5 checks passed

chris-colinsky deleted the chore/fixture-harness-tier-1-typed-events branch June 24, 2026 03:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wire tier-1 typed-event conformance fixtures#184

Wire tier-1 typed-event conformance fixtures#184
chris-colinsky merged 2 commits into
mainfrom
chore/fixture-harness-tier-1-typed-events

chris-colinsky commented Jun 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

chris-colinsky commented Jun 24, 2026

Summary

Wired (10)

Harness machinery added

Deferred (4)

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants