fix: nest root LLM runs under the flow trace in Langfuse (#13429) by erichare · Pull Request #13539 · langflow-ai/langflow

erichare · 2026-06-08T19:04:07Z

Summary

Fixes #13429 (follow-up to #13319).

When a flow runs a model as the root LangChain run — no wrapping chain, reproduced with Ollama — the langfuse v3 CallbackHandler emitted the LLM generation as a separate, orphan trace instead of nesting it under the flow trace:

Trace	`userId` / `sessionId`	Contents
Flow trace (`<flow_id>`)	set correctly	root span + component spans (Chat Input, Ollama, Chat Output) — no generation
Orphan trace (`"Ollama"`)	`None` / `None`	the real generation (`model=llama3.2`, `parent=None`, tokens 32→2) with metadata `is_langchain_root: true`

Because the orphan trace has no userId/sessionId, the token-usage metrics could not be attributed to a session or user, and the generation was invisible inside the flow trace.

Root cause

The langfuse v3 LangChain CallbackHandler only applies its constructor trace_context on the chain path (on_chain_start → start_observation(trace_context=...)). The generation path (__on_llm_action, used by on_chat_model_start / on_llm_start) calls start_observation(as_type="generation", ...) without trace_context. When the model is the root run (parent_run_id is None) there is no active OpenTelemetry span in context, so the SDK starts a brand-new root trace for the generation — the is_langchain_root: true condition the issue identified.

This is purely an SDK behavior; the prior tracing fixes (#13266, #13341, #13344) do not address it.

The fix

LangFuseTracer.get_langchain_callback now returns a small CallbackHandler subclass that, for root LLM runs only (parent_run_id is None), activates the flow's component (or root) span as the current OpenTelemetry span while the SDK creates the generation span. The generation then:

inherits the flow trace_id (no longer orphaned), and
nests under the corresponding component span,

so it shares the parent trace's userId / sessionId and its token usage is attributed correctly.

Mechanics:

_build_otel_parent_span(trace_id, parent_span_id) builds a non-recording OTel parent from values we already hold — mirroring the SDK's own _create_remote_parent_span used on the chain path — using only public OTel API (no langfuse private attributes). It degrades to None (default SDK behavior) if the ids aren't valid hex.
The subclass wraps super().on_chat_model_start / super().on_llm_start in opentelemetry.trace.use_span(...). The handler sets run_inline = True, so these callbacks run synchronously inside the model invocation and the activation reliably wraps span creation. end_on_exit=False / record_exception=False ensure the parent span is never closed or mutated.

Non-root runs are untouched — when a wrapping chain/agent is present the LLM fires with a non-None parent_run_id, and the SDK already nests it correctly under the chain span. Only the bare-model case changes.

Test plan

test_langfuse_orphan_generation.py — end-to-end test driving the real langfuse SDK with an in-memory OpenTelemetry exporter (a pure mock cannot catch this — the orphaning happens inside the SDK). Asserts the root LLM generation:
- shares the flow trace_id (the core of Langfuse trace contents / Hierarchy issues #13429), and
- parents under the component span (not a root of its own trace), and
- is recorded as a generation (carries token usage).
- Manually verified this test fails on the pre-fix behavior (SAME TRACE: False, gen.parent: None) and passes with the fix.
Unit tests for _build_otel_parent_span (hex / non-hex / missing ids) and the re-parenting handler (activates parent for root chat-model & llm runs; does not activate for non-root runs; safe when the parent is unresolvable).
Updated the two existing get_langchain_callback tests to subclass a real fake base (the handler is now subclassed, so a bare MagicMock base no longer works).
Full tracing suite green: 298 passed, 5 skipped.
ruff check / ruff format clean; pre-commit hooks pass.

Summary by CodeRabbit

Bug Fixes
- Fixed orphan span generation preventing proper trace attribution in distributed tracing systems.
- Improved trace nesting to ensure language model operations are correctly attributed to parent flow spans.
Tests
- Added comprehensive regression test suite for trace hierarchy and span parenting validation.
- Enhanced distributed tracing system compatibility test coverage.

A model invoked as the root LangChain run (no wrapping chain) — reproduced with Ollama — was emitted by the langfuse v3 CallbackHandler as a separate orphan trace: parent=None, userId=None, sessionId=None, with the token usage detached from the flow trace, breaking cost/usage attribution. Root cause: the SDK only applies the constructor `trace_context` on the chain path (`on_chain_start`); the generation path calls `start_observation` without it, so with no active OpenTelemetry span the generation starts a brand-new trace (metadata `is_langchain_root: true`). `get_langchain_callback` now returns a `CallbackHandler` subclass that, for root LLM runs only (`parent_run_id is None`), activates the flow's component (or root) span as the current OTel span while the SDK creates the generation. The generation then inherits the flow `trace_id` and nests under the component span, restoring user/session attribution and token metrics. Non-root runs (wrapping chain/agent present) are left untouched. Adds focused unit tests plus an end-to-end test that drives the real langfuse SDK with an in-memory OpenTelemetry exporter and asserts the generation shares the flow trace_id and parents under the component span.

coderabbitai · 2026-06-08T19:04:35Z

Walkthrough

The PR fixes Langfuse orphan traces by introducing OpenTelemetry span re-parenting. New utilities compute and activate the flow's component span as the OTel parent context when instantiating the Langfuse LangChain callback, ensuring root LLM runs are nested under the flow trace and remain linked to sessions and users.

Changes

Langfuse OTel Span Re-parenting for Orphan Traces

Layer / File(s)	Summary
OTel span re-parenting implementation `src/backend/base/langflow/services/tracing/langfuse.py`	`_build_otel_parent_span` constructs a sampled NonRecordingSpan from trace and span hex IDs with validation; `_root_run_reparenting_handler_cls` creates a cached CallbackHandler subclass that re-activates the flow/component span as OTEL parent for root LLM runs; `get_langchain_callback()` computes the parent span and passes both `trace_context` and `otel_parent` to the handler.
Unit tests for span builder and fixtures `src/backend/tests/unit/services/tracing/test_langfuse_orphan_generation.py`	Pytest fixtures configure Langfuse credentials and reset the shared client to prevent test leakage. Unit tests verify `_build_otel_parent_span` returns `None` for invalid/missing IDs and correctly builds sampled spans for valid hex inputs.
Re-parenting handler behavior tests `src/backend/tests/unit/services/tracing/test_langfuse_orphan_generation.py`	`_RecordingBase` helper captures OTel span context on callback start. `TestRootRunReparentingHandler` verifies the handler activates the provided parent span only for root runs and safely handles missing parent span info.
End-to-end trace nesting validation `src/backend/tests/unit/services/tracing/test_langfuse_orphan_generation.py`	`_build_real_langfuse_client_or_skip` initializes a real Langfuse client with in-memory OTEL exporter and suppresses network export. `TestRootGenerationNestsUnderFlowTrace` constructs a live tracer with component span and runs a root chat model generation, asserting the result is nested under the component rather than orphaned.
Compatibility test updates with test double `src/backend/tests/unit/services/tracing/test_langfuse_v3_compatibility.py`	`_FakeLangchainCallbackHandler` replaces mocked CallbackHandler as a subclassable test double. Updated tests patch the Langfuse CallbackHandler and assert the returned handler contains expected `trace_id` and `parent_span_id` in `trace_context`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

bug, lgtm

Suggested reviewers

Cristhianzl

🚥 Pre-merge checks | ✅ 8 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 32.35% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (8 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title concisely and accurately describes the main fix: nesting root LLM runs under the flow trace in Langfuse, directly addressing issue `#13429`.
Linked Issues check	✅ Passed	The PR fully addresses the requirements from `#13429` by implementing proper nesting of root LLM runs under the flow trace, ensuring tokens/metrics are linked to sessions/users.
Out of Scope Changes check	✅ Passed	All changes are directly related to fixing the Langfuse orphan trace issue: core implementation in langfuse.py, comprehensive regression tests, and updated existing tests.
Test Coverage For New Implementations	✅ Passed	New test file (276 lines, 8 tests) covers orphan-generation fix with unit and end-to-end tests. Updated existing tests for changed functions. Tests verify new helper functions and edge cases.
Test Quality And Coverage	✅ Passed	Tests comprehensively cover implementations with 8 unit tests and 25+ assertions validating behavior beyond smoke tests, proper pytest patterns, edge cases, and end-to-end validation.
Test File Naming And Structure	✅ Passed	Test files follow pytest standards: test_*.py naming, proper class/method structure, descriptive names, fixtures, setup/teardown, comprehensive coverage, docstrings, and error handling.
Excessive Mock Usage Warning	✅ Passed	New test file minimizes mock usage with unit tests, test doubles, and real SDK end-to-end test. Compatibility tests appropriately use mocks for external dependencies.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/brave-ride-efa163

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-08T19:05:22Z

✅ Test Coverage Advisor

No source changes detected without accompanying tests. Thanks for keeping coverage up! 🎉

Advisory check only — never blocks merge.

codecov · 2026-06-08T19:15:06Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 58.45%. Comparing base (60afa18) to head (018040f).
⚠️ Report is 52 commits behind head on release-1.10.1.

❌ Your project check has failed because the head coverage (54.19%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@                Coverage Diff                 @@
##           release-1.10.1   #13539      +/-   ##
==================================================
+ Coverage           58.42%   58.45%   +0.03%     
==================================================
  Files                2289     2289              
  Lines              219033   219063      +30     
  Branches            31120    32923    +1803     
==================================================
+ Hits               127961   128051      +90     
+ Misses              89616    89556      -60     
  Partials             1456     1456

Flag	Coverage Δ
backend	`65.23% <100.00%> (+0.10%)`	⬆️
frontend	`57.79% <ø> (+0.02%)`	⬆️
lfx	`54.19% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...backend/base/langflow/services/tracing/langfuse.py	`86.59% <100.00%> (+3.06%)`	⬆️

... and 36 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2026-06-08T19:20:18Z

Frontend Unit Test Coverage Report

Coverage Summary

Lines	Statements	Branches	Functions
	43.28% (57621/133123)	69.21% (7828/11309)	41.49% (1291/3111)

Unit Test Results

Tests	Skipped	Failures	Errors	Time
4940	0 💤	0 ❌	0 🔥	14m 56s ⏱️

github-actions Bot added the bug Something isn't working label Jun 8, 2026

github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Jun 8, 2026

erichare changed the base branch from release-1.10.0 to release-1.10.1 June 10, 2026 18:48

github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Jun 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: nest root LLM runs under the flow trace in Langfuse (#13429)#13539

fix: nest root LLM runs under the flow trace in Langfuse (#13429)#13539
erichare wants to merge 1 commit into
release-1.10.1from
fix/brave-ride-efa163

erichare commented Jun 8, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

codecov Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

erichare commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

The fix

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 8, 2026

✅ Test Coverage Advisor

Uh oh!

codecov Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Jun 8, 2026

Frontend Unit Test Coverage Report

Coverage Summary

Unit Test Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

erichare commented Jun 8, 2026 •

edited

Loading

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading

codecov Bot commented Jun 8, 2026 •

edited

Loading