Skip to content

fix: nest root LLM runs under the flow trace in Langfuse (#13429)#13539

Open
erichare wants to merge 1 commit into
release-1.10.1from
fix/brave-ride-efa163
Open

fix: nest root LLM runs under the flow trace in Langfuse (#13429)#13539
erichare wants to merge 1 commit into
release-1.10.1from
fix/brave-ride-efa163

Conversation

@erichare

@erichare erichare commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Summary

Fixes #13429 (follow-up to #13319).

When a flow runs a model as the root LangChain run — no wrapping chain, reproduced with Ollama — the langfuse v3 CallbackHandler emitted the LLM generation as a separate, orphan trace instead of nesting it under the flow trace:

Trace userId / sessionId Contents
Flow trace (<flow_id>) set correctly root span + component spans (Chat Input, Ollama, Chat Output) — no generation
Orphan trace ("Ollama") None / None the real generation (model=llama3.2, parent=None, tokens 32→2) with metadata is_langchain_root: true

Because the orphan trace has no userId/sessionId, the token-usage metrics could not be attributed to a session or user, and the generation was invisible inside the flow trace.

Root cause

The langfuse v3 LangChain CallbackHandler only applies its constructor trace_context on the chain path (on_chain_startstart_observation(trace_context=...)). The generation path (__on_llm_action, used by on_chat_model_start / on_llm_start) calls start_observation(as_type="generation", ...) without trace_context. When the model is the root run (parent_run_id is None) there is no active OpenTelemetry span in context, so the SDK starts a brand-new root trace for the generation — the is_langchain_root: true condition the issue identified.

This is purely an SDK behavior; the prior tracing fixes (#13266, #13341, #13344) do not address it.

The fix

LangFuseTracer.get_langchain_callback now returns a small CallbackHandler subclass that, for root LLM runs only (parent_run_id is None), activates the flow's component (or root) span as the current OpenTelemetry span while the SDK creates the generation span. The generation then:

  • inherits the flow trace_id (no longer orphaned), and
  • nests under the corresponding component span,

so it shares the parent trace's userId / sessionId and its token usage is attributed correctly.

Mechanics:

  • _build_otel_parent_span(trace_id, parent_span_id) builds a non-recording OTel parent from values we already hold — mirroring the SDK's own _create_remote_parent_span used on the chain path — using only public OTel API (no langfuse private attributes). It degrades to None (default SDK behavior) if the ids aren't valid hex.
  • The subclass wraps super().on_chat_model_start / super().on_llm_start in opentelemetry.trace.use_span(...). The handler sets run_inline = True, so these callbacks run synchronously inside the model invocation and the activation reliably wraps span creation. end_on_exit=False / record_exception=False ensure the parent span is never closed or mutated.

Non-root runs are untouched — when a wrapping chain/agent is present the LLM fires with a non-None parent_run_id, and the SDK already nests it correctly under the chain span. Only the bare-model case changes.

Test plan

  • test_langfuse_orphan_generation.pyend-to-end test driving the real langfuse SDK with an in-memory OpenTelemetry exporter (a pure mock cannot catch this — the orphaning happens inside the SDK). Asserts the root LLM generation:
    • shares the flow trace_id (the core of Langfuse trace contents / Hierarchy issues #13429), and
    • parents under the component span (not a root of its own trace), and
    • is recorded as a generation (carries token usage).
    • Manually verified this test fails on the pre-fix behavior (SAME TRACE: False, gen.parent: None) and passes with the fix.
  • Unit tests for _build_otel_parent_span (hex / non-hex / missing ids) and the re-parenting handler (activates parent for root chat-model & llm runs; does not activate for non-root runs; safe when the parent is unresolvable).
  • Updated the two existing get_langchain_callback tests to subclass a real fake base (the handler is now subclassed, so a bare MagicMock base no longer works).
  • Full tracing suite green: 298 passed, 5 skipped.
  • ruff check / ruff format clean; pre-commit hooks pass.

Summary by CodeRabbit

  • Bug Fixes

    • Fixed orphan span generation preventing proper trace attribution in distributed tracing systems.
    • Improved trace nesting to ensure language model operations are correctly attributed to parent flow spans.
  • Tests

    • Added comprehensive regression test suite for trace hierarchy and span parenting validation.
    • Enhanced distributed tracing system compatibility test coverage.

A model invoked as the root LangChain run (no wrapping chain) — reproduced
with Ollama — was emitted by the langfuse v3 CallbackHandler as a separate
orphan trace: parent=None, userId=None, sessionId=None, with the token usage
detached from the flow trace, breaking cost/usage attribution.

Root cause: the SDK only applies the constructor `trace_context` on the chain
path (`on_chain_start`); the generation path calls `start_observation` without
it, so with no active OpenTelemetry span the generation starts a brand-new
trace (metadata `is_langchain_root: true`).

`get_langchain_callback` now returns a `CallbackHandler` subclass that, for
root LLM runs only (`parent_run_id is None`), activates the flow's component
(or root) span as the current OTel span while the SDK creates the generation.
The generation then inherits the flow `trace_id` and nests under the component
span, restoring user/session attribution and token metrics. Non-root runs
(wrapping chain/agent present) are left untouched.

Adds focused unit tests plus an end-to-end test that drives the real langfuse
SDK with an in-memory OpenTelemetry exporter and asserts the generation shares
the flow trace_id and parents under the component span.
@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

The PR fixes Langfuse orphan traces by introducing OpenTelemetry span re-parenting. New utilities compute and activate the flow's component span as the OTel parent context when instantiating the Langfuse LangChain callback, ensuring root LLM runs are nested under the flow trace and remain linked to sessions and users.

Changes

Langfuse OTel Span Re-parenting for Orphan Traces

Layer / File(s) Summary
OTel span re-parenting implementation
src/backend/base/langflow/services/tracing/langfuse.py
_build_otel_parent_span constructs a sampled NonRecordingSpan from trace and span hex IDs with validation; _root_run_reparenting_handler_cls creates a cached CallbackHandler subclass that re-activates the flow/component span as OTEL parent for root LLM runs; get_langchain_callback() computes the parent span and passes both trace_context and otel_parent to the handler.
Unit tests for span builder and fixtures
src/backend/tests/unit/services/tracing/test_langfuse_orphan_generation.py
Pytest fixtures configure Langfuse credentials and reset the shared client to prevent test leakage. Unit tests verify _build_otel_parent_span returns None for invalid/missing IDs and correctly builds sampled spans for valid hex inputs.
Re-parenting handler behavior tests
src/backend/tests/unit/services/tracing/test_langfuse_orphan_generation.py
_RecordingBase helper captures OTel span context on callback start. TestRootRunReparentingHandler verifies the handler activates the provided parent span only for root runs and safely handles missing parent span info.
End-to-end trace nesting validation
src/backend/tests/unit/services/tracing/test_langfuse_orphan_generation.py
_build_real_langfuse_client_or_skip initializes a real Langfuse client with in-memory OTEL exporter and suppresses network export. TestRootGenerationNestsUnderFlowTrace constructs a live tracer with component span and runs a root chat model generation, asserting the result is nested under the component rather than orphaned.
Compatibility test updates with test double
src/backend/tests/unit/services/tracing/test_langfuse_v3_compatibility.py
_FakeLangchainCallbackHandler replaces mocked CallbackHandler as a subclassable test double. Updated tests patch the Langfuse CallbackHandler and assert the returned handler contains expected trace_id and parent_span_id in trace_context.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

bug, lgtm

Suggested reviewers

  • Cristhianzl
🚥 Pre-merge checks | ✅ 8 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 32.35% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (8 passed)
Check name Status Explanation
Title check ✅ Passed The title concisely and accurately describes the main fix: nesting root LLM runs under the flow trace in Langfuse, directly addressing issue #13429.
Linked Issues check ✅ Passed The PR fully addresses the requirements from #13429 by implementing proper nesting of root LLM runs under the flow trace, ensuring tokens/metrics are linked to sessions/users.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing the Langfuse orphan trace issue: core implementation in langfuse.py, comprehensive regression tests, and updated existing tests.
Test Coverage For New Implementations ✅ Passed New test file (276 lines, 8 tests) covers orphan-generation fix with unit and end-to-end tests. Updated existing tests for changed functions. Tests verify new helper functions and edge cases.
Test Quality And Coverage ✅ Passed Tests comprehensively cover implementations with 8 unit tests and 25+ assertions validating behavior beyond smoke tests, proper pytest patterns, edge cases, and end-to-end validation.
Test File Naming And Structure ✅ Passed Test files follow pytest standards: test_*.py naming, proper class/method structure, descriptive names, fixtures, setup/teardown, comprehensive coverage, docstrings, and error handling.
Excessive Mock Usage Warning ✅ Passed New test file minimizes mock usage with unit tests, test doubles, and real SDK end-to-end test. Compatibility tests appropriately use mocks for external dependencies.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/brave-ride-efa163

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the bug Something isn't working label Jun 8, 2026
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

✅ Test Coverage Advisor

No source changes detected without accompanying tests. Thanks for keeping coverage up! 🎉

Advisory check only — never blocks merge.

@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Jun 8, 2026
@codecov

codecov Bot commented Jun 8, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 58.45%. Comparing base (60afa18) to head (018040f).
⚠️ Report is 52 commits behind head on release-1.10.1.

❌ Your project check has failed because the head coverage (54.19%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@                Coverage Diff                 @@
##           release-1.10.1   #13539      +/-   ##
==================================================
+ Coverage           58.42%   58.45%   +0.03%     
==================================================
  Files                2289     2289              
  Lines              219033   219063      +30     
  Branches            31120    32923    +1803     
==================================================
+ Hits               127961   128051      +90     
+ Misses              89616    89556      -60     
  Partials             1456     1456              
Flag Coverage Δ
backend 65.23% <100.00%> (+0.10%) ⬆️
frontend 57.79% <ø> (+0.02%) ⬆️
lfx 54.19% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...backend/base/langflow/services/tracing/langfuse.py 86.59% <100.00%> (+3.06%) ⬆️

... and 36 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Frontend Unit Test Coverage Report

Coverage Summary

Lines Statements Branches Functions
Coverage: 43%
43.28% (57621/133123) 69.21% (7828/11309) 41.49% (1291/3111)

Unit Test Results

Tests Skipped Failures Errors Time
4940 0 💤 0 ❌ 0 🔥 14m 56s ⏱️

@erichare erichare changed the base branch from release-1.10.0 to release-1.10.1 June 10, 2026 18:48
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant