Relocate Langfuse fixtures to their own runner#187
Merged
Conversation
The Langfuse-mapping observability fixtures are fixture-tested by the dedicated sibling runner tests/conformance/test_observability_langfuse.py, not by unit tests. The _UNIT_TESTED_FIXTURES bucket in test_observability.py mislabeled them, so the fixture-harness catch-up wrongly wired 022/023/024/031/032 into test_observability.py -- duplicating coverage that already existed in the sibling runner -- and homed the genuinely-new 035/036/059 there too. Revert all Langfuse wiring from test_observability.py (back to its post-tier-1 state) and add a _LANGFUSE_HARNESS_FIXTURES bucket that the coverage guard counts and the dispatcher skips, recording that the sibling runner owns them. Relocate 035/036/059 into test_observability_langfuse.py, extending its runner for caller_invocation_id, the derived-trace.id bridge, 036's raw invocation_id, and the harness_parameterized / non_empty_string metadata matcher. Test-only; no behavior or pin change.
There was a problem hiding this comment.
Pull request overview
This PR refactors the conformance test harness to consolidate all Langfuse-mapping observability fixtures under the dedicated Langfuse runner, avoiding duplicated coverage and clarifying fixture ownership between runners.
Changes:
- Removed Langfuse fixture execution and assertion helpers from
test_observability.py, replacing them with a_LANGFUSE_HARNESS_FIXTURESaccounting bucket and an earlypytest.skipfor those fixture IDs. - Extended
test_observability_langfuse.pyto run fixtures035/036/059, including support for caller-suppliedinvocation_id, derivedtrace.idbridging, and metadata matcher sub-keys for fixture059.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| tests/conformance/test_observability.py | Stops running Langfuse fixtures in the OTel-focused harness; accounts for them via _LANGFUSE_HARNESS_FIXTURES and skips them during parametrized execution. |
| tests/conformance/test_observability_langfuse.py | Adds execution + assertions for Langfuse fixtures 035/036/059, including invoke-time invocation_id wiring and new metadata matcher handling. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
CodeQL py/side-effect-in-assert flagged the expected_metadata.pop() inside the assert: under python -O the assertion (and its pop) is stripped, leaving the invocation_id key in expected_metadata for the later _assert_metadata_subset to wrongly check. Bind the pop to a local before the assert, and include the expected value in the failure message.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
The Langfuse-mapping observability conformance fixtures are fixture-tested by the dedicated sibling runner
tests/conformance/test_observability_langfuse.py, not by unit tests. The_UNIT_TESTED_FIXTURESbucket intest_observability.pymislabeled them ("covered by test_observability_langfuse.py" referred to that conformance runner, not a unit test), so the in-progress fixture-harness catch-up wrongly wired022/023/024/031/032intotest_observability.py, duplicating coverage that already existed in the sibling runner, and homed the genuinely-new035/036/059there as well.This PR consolidates all Langfuse fixtures in their proper file.
Changes
test_observability.py: reverted all Langfuse wiring (back to its post-tier-1 state). Added a_LANGFUSE_HARNESS_FIXTURESbucket that the coverage guard counts and the per-fixture dispatcher skips, recording that the sibling runner owns those 12 fixtures._UNIT_TESTED_FIXTURESnow holds only the genuinely-unit-only fixtures.test_observability_langfuse.py: added035/036/059to_LANGFUSE_FIXTURES, with four small runner extensions:caller_invocation_idat invoke (035/036), thelangfuse_trace_idderived-id bridge for literal trace ids, 036's rawinvocation_idrecovered fromtrace.id, and theharness_parameterized/non_empty_stringmetadata matcher (059).Net
-322lines as the duplication comes out.Test plan
tests/conformance/test_observability.py+test_observability_langfuse.py: 76 passed, 48 skipped (Langfuse fixtures skip in the first, run in the second).tests/: 1459 passed (5 fewer than before, exactly the de-duplicated double-runs).Test-only; no behavior or pin change.