Skip to content

Wire tier-2b Langfuse Generation fixtures#186

Merged
chris-colinsky merged 3 commits into
mainfrom
chore/fixture-harness-tier-2b-generation
Jun 24, 2026
Merged

Wire tier-2b Langfuse Generation fixtures#186
chris-colinsky merged 3 commits into
mainfrom
chore/fixture-harness-tier-2b-generation

Conversation

@chris-colinsky

Copy link
Copy Markdown
Member

Summary

Completes the Langfuse tier of the conformance-harness fixture catch-up: wire
the two Langfuse Generation fixtures into the YAML harness. Test-only, no
library change, no pin bump.

Wired (2)

Moved from _UNIT_TESTED_FIXTURES to _SUPPORTED_FIXTURES:

  • 023: Generation rendering (model / modelParameters / usage / input-output
    metadata) plus the payload-truncation fallthrough (input becomes the raw
    marker-bearing string once it exceeds the byte cap).
  • 024: Prompt-entity linkage, both the present case (a backend exposing a
    Langfuse Prompt reference) and the absent case.

Harness machinery added

  • _run_langfuse_generation_fixture builds a calls_llm graph, records into an
    InMemoryLangfuseClient under the fixture's disable_provider_payload /
    payload_byte_cap config, and asserts the Generation observation nested under
    the node span.
  • _assert_langfuse_generation_fields covers model / modelParameters / usage /
    prompt_entity_link and the two input shapes (native message list under the
    cap, raw truncated string with the marker over it). The placeholder-capable
    fields run through the value matcher.
  • The value matcher gained nested-dict recursion so 024's metadata.prompt
    (with an inner <any-string> rendered_hash) matches.
  • _materialize_typed_messages gained content_repeat synthesis (023), and
    _render_prompt_result carries a backend's Langfuse prompt reference into
    PromptResult.observability_entities, which the observer resolves into the
    Generation's prompt-entity link (024).

Testing

  • tests/conformance/test_observability.py: 72 passed, 40 skipped.
  • Full tests/: 1464 passed, 406 skipped.
  • ruff and pyright clean.

Move 023 (generation rendering + payload truncation) and 024 (prompt
linkage) from _UNIT_TESTED_FIXTURES into _SUPPORTED_FIXTURES, driven
through a LangfuseObserver + InMemoryLangfuseClient recorder. Completes
the Langfuse tier of the fixture-harness catch-up; test-only, no library
change, no pin bump.

Adds a generation runner that asserts the Generation observation
(model / modelParameters / usage / input-output payload + prompt-entity
link) nested under the node span, plus content_repeat synthesis +
payload_byte_cap truncation (023) and a prompt-backend Langfuse
reference carried via PromptResult.observability_entities (024). The
value-matcher gained nested-dict recursion for metadata.prompt. No deferrals.
Copilot AI review requested due to automatic review settings June 24, 2026 19:36

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR completes the YAML conformance-harness wiring for the Tier 2b Langfuse “Generation” fixtures (023/024), moving them from unit-tested-only coverage into the main fixture runner. The changes are test-only and extend the harness to validate Langfuse Generation rendering, truncation behavior, and prompt-entity linkage per the spec mapping.

Changes:

  • Added a new Langfuse Generation fixture driver (_run_langfuse_generation_fixture) and Generation-field assertions integrated into the Langfuse observation-tree matcher.
  • Extended the Langfuse value matcher to recurse into nested mappings so placeholder tokens can match inside nested objects (needed for fixture 024).
  • Added content_repeat synthesis for typed messages and carried Langfuse prompt references via PromptResult.observability_entities.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/conformance/test_observability.py
Address review feedback on PR #186: the input_is_raw_string_with_marker
check matched a bare "[truncated" substring, which could false-positive
on arbitrary content. Tighten it to a regex matching the full marker
shape, matching the observer's _TRUNCATION_MARKER_TEMPLATE and consistent
with the OTel marker_pattern approach.
Fold in the python-side nuances from spec's Tier 2 review:

- _assert_langfuse_observation_tree now disambiguates same-(type, name)
  sibling observations (032's per-instance "process" spans) by their
  scalar metadata rather than emission order, so the assertions can't
  bind the wrong sibling if the observer's emission order shifts.
- _run_invocation_id_case now asserts the fixture's top-level verbatim
  invocation_id clause (035/036) against the in-memory recorder's raw
  trace.id, so it isn't half-asserted across the OTel and Langfuse
  runners.
Copilot AI review requested due to automatic review settings June 24, 2026 20:21

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comment on lines +2658 to +2662
# A regular NON-empty nested mapping (e.g. 024 metadata.prompt): recurse per
# key so inner tokens (rendered_hash: <any-string>) still apply. Subset over
# keys -- every expected key must be present and match; actual MAY carry
# extras. An empty expected dict falls through to exact equality below
# (rather than vacuously matching any mapping).
Comment on lines +2871 to +2874
graph, state_cls, provider = _build_simple_llm_graph(case, populate_caller_metadata=False)
client = InMemoryLangfuseClient()
cfg = cast("dict[str, Any]", case.get("langfuse_observer") or {})
lf_kwargs: dict[str, Any] = {"client": client}
@chris-colinsky chris-colinsky merged commit 122dcd2 into main Jun 24, 2026
6 checks passed
@chris-colinsky chris-colinsky deleted the chore/fixture-harness-tier-2b-generation branch June 24, 2026 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants