Skip to content

Wire tier-1 typed-event conformance fixtures#184

Merged
chris-colinsky merged 2 commits into
mainfrom
chore/fixture-harness-tier-1-typed-events
Jun 24, 2026
Merged

Wire tier-1 typed-event conformance fixtures#184
chris-colinsky merged 2 commits into
mainfrom
chore/fixture-harness-tier-1-typed-events

Conversation

@chris-colinsky

Copy link
Copy Markdown
Member

Summary

First tier of the conformance-harness fixture catch-up: wire the typed
LLM-event observability fixtures that python currently satisfies through its own
unit tests into the YAML conformance harness, so the harness runs the spec's own
fixtures rather than python-authored equivalents. Test-only, no library change,
no pin bump; this strengthens the cross-impl conformance signal.

Wired (10)

Moved from _UNIT_TESTED_FIXTURES to _SUPPORTED_FIXTURES:

  • 060-065, 068: LlmCompletionEvent field population (input messages, output
    content, request params and extras, active prompt plus the null case,
    response model)
  • 067, 071: call_id distinctness across a multi-node chain
  • 072: LlmFailedEvent / LlmCompletionEvent mutual exclusion

Harness machinery added

  • A multi-node-chain runner (_build_chain_llm_graph) for the chained-call
    fixtures, alongside the existing single-node typed-event runner.
  • New assertion shapes: fields_absent_keys, the event_counts list form, the
    non-empty <any-string> matcher, and a recursive value matcher so
    active_prompt (a PromptResult) compares against the fixture's identity
    mapping.
  • renders_prompt active-prompt binding, request-model binding from
    calls_llm.model, and RuntimeConfig construction from calls_llm.config.
  • expected_error now asserts both the error category and the originating node;
    the call_id presence/distinctness invariants for 067/071 are machine-checked.
  • Consolidated the mock-transport and RuntimeConfig construction into shared
    helpers.

Deferred (4)

Four fixtures stay in _UNIT_TESTED_FIXTURES with documented reasons, each
blocked on an upstream spec fixture change to be picked up at the next pin bump.
The behavior is already covered by the unit suite, and the coverage guard keeps
the accounting honest:

  • 066: the prompt group has a single member at the current pin; python's
    PromptGroup requires at least two. The corrected fixture is upstream.
  • 069: asserts a request model the fixture does not declare.
  • 070: the malformed tool message (no tool_call_id) is non-constructible here,
    since ToolMessage.tool_call_id is required and validated at construction
    rather than the call boundary.
  • 073: asserts the vendor body error.type verbatim, where python surfaces the
    exception class name (a contract-permitted style).

Testing

  • tests/conformance/test_observability.py: 64 passed, 48 skipped.
  • Full tests/: 1456 passed, 414 skipped.
  • ruff and pyright clean.

Move 10 LlmCompletionEvent/LlmFailedEvent fixtures (060-065, 067, 068,
071, 072) from _UNIT_TESTED_FIXTURES into _SUPPORTED_FIXTURES so the
conformance harness runs the spec's own YAML fixtures rather than
python's hand-written equivalents. First tier of the fixture-harness
catch-up; strengthens the cross-impl conformance signal with no library
change.

Four of the family stay unit-tested, each blocked on a spec-side fixture
change to be picked up at the next pin bump: 066 (single-member prompt
group, corrected upstream), 069 (asserts an undeclared request model),
070 (missing tool_call_id is non-constructible here, validated at
construction not the call boundary), 073 (asserts the vendor error.type
where python surfaces the exception class name).
Copilot AI review requested due to automatic review settings June 24, 2026 02:37

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Wires tier-1 typed LLM-event observability fixtures (proposal 0057/0058 family) into the YAML conformance harness so the Python implementation is validated against the spec-owned fixtures rather than Python-authored unit-test equivalents.

Changes:

  • Moves a set of typed-event fixtures (060–065, 067, 068, 071, 072) into _SUPPORTED_FIXTURES and routes them through the conformance runner.
  • Adds new harness machinery: multi-node chain runner, expanded assertion shapes (event_counts, fields_absent_keys, <any-string>), and recursive value matching for structured fields like active_prompt.
  • Consolidates shared helpers for mock transport + RuntimeConfig construction and strengthens failure-path assertions (expected_error, call_id invariants).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/conformance/test_observability.py
Comment thread tests/conformance/test_observability.py Outdated
Comment thread tests/conformance/test_observability.py Outdated
Comment thread tests/conformance/test_observability.py
Address review feedback on the tier-1 fixture wiring:

- _render_prompt_result derives rendered_hash via the canonical
  compute_rendered_hash(messages) helper instead of a bespoke truncated
  SHA, dropping the hashlib import.
- _build_runtime_config is annotated RuntimeConfig | None (via a
  TYPE_CHECKING import) instead of Any, restoring type information at the
  provider.complete call sites.
- _materialize_typed_messages asserts system/user content is a present,
  non-empty string instead of coercing to empty, so a fixture mistake
  fails on the real field rather than as a downstream model ValueError.
@chris-colinsky chris-colinsky merged commit ec73dc5 into main Jun 24, 2026
5 checks passed
@chris-colinsky chris-colinsky deleted the chore/fixture-harness-tier-1-typed-events branch June 24, 2026 03:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants