Wire tier-1 typed-event conformance fixtures#184
Merged
Conversation
Move 10 LlmCompletionEvent/LlmFailedEvent fixtures (060-065, 067, 068, 071, 072) from _UNIT_TESTED_FIXTURES into _SUPPORTED_FIXTURES so the conformance harness runs the spec's own YAML fixtures rather than python's hand-written equivalents. First tier of the fixture-harness catch-up; strengthens the cross-impl conformance signal with no library change. Four of the family stay unit-tested, each blocked on a spec-side fixture change to be picked up at the next pin bump: 066 (single-member prompt group, corrected upstream), 069 (asserts an undeclared request model), 070 (missing tool_call_id is non-constructible here, validated at construction not the call boundary), 073 (asserts the vendor error.type where python surfaces the exception class name).
There was a problem hiding this comment.
Pull request overview
Wires tier-1 typed LLM-event observability fixtures (proposal 0057/0058 family) into the YAML conformance harness so the Python implementation is validated against the spec-owned fixtures rather than Python-authored unit-test equivalents.
Changes:
- Moves a set of typed-event fixtures (060–065, 067, 068, 071, 072) into
_SUPPORTED_FIXTURESand routes them through the conformance runner. - Adds new harness machinery: multi-node chain runner, expanded assertion shapes (
event_counts,fields_absent_keys,<any-string>), and recursive value matching for structured fields likeactive_prompt. - Consolidates shared helpers for mock transport +
RuntimeConfigconstruction and strengthens failure-path assertions (expected_error, call_id invariants).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Address review feedback on the tier-1 fixture wiring: - _render_prompt_result derives rendered_hash via the canonical compute_rendered_hash(messages) helper instead of a bespoke truncated SHA, dropping the hashlib import. - _build_runtime_config is annotated RuntimeConfig | None (via a TYPE_CHECKING import) instead of Any, restoring type information at the provider.complete call sites. - _materialize_typed_messages asserts system/user content is a present, non-empty string instead of coercing to empty, so a fixture mistake fails on the real field rather than as a downstream model ValueError.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First tier of the conformance-harness fixture catch-up: wire the typed
LLM-event observability fixtures that python currently satisfies through its own
unit tests into the YAML conformance harness, so the harness runs the spec's own
fixtures rather than python-authored equivalents. Test-only, no library change,
no pin bump; this strengthens the cross-impl conformance signal.
Wired (10)
Moved from
_UNIT_TESTED_FIXTURESto_SUPPORTED_FIXTURES:LlmCompletionEventfield population (input messages, outputcontent, request params and extras, active prompt plus the null case,
response model)
LlmFailedEvent/LlmCompletionEventmutual exclusionHarness machinery added
_build_chain_llm_graph) for the chained-callfixtures, alongside the existing single-node typed-event runner.
fields_absent_keys, theevent_countslist form, thenon-empty
<any-string>matcher, and a recursive value matcher soactive_prompt(aPromptResult) compares against the fixture's identitymapping.
renders_promptactive-prompt binding, request-model binding fromcalls_llm.model, andRuntimeConfigconstruction fromcalls_llm.config.expected_errornow asserts both the error category and the originating node;the call_id presence/distinctness invariants for 067/071 are machine-checked.
RuntimeConfigconstruction into sharedhelpers.
Deferred (4)
Four fixtures stay in
_UNIT_TESTED_FIXTURESwith documented reasons, eachblocked on an upstream spec fixture change to be picked up at the next pin bump.
The behavior is already covered by the unit suite, and the coverage guard keeps
the accounting honest:
PromptGrouprequires at least two. The corrected fixture is upstream.tool_call_id) is non-constructible here,since
ToolMessage.tool_call_idis required and validated at constructionrather than the call boundary.
error.typeverbatim, where python surfaces theexception class name (a contract-permitted style).
Testing
tests/conformance/test_observability.py: 64 passed, 48 skipped.tests/: 1456 passed, 414 skipped.