Skip to content

feat: enhance tracing system with OpenTelemetry semantic conventions #1331

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

Pouyanpi
Copy link
Collaborator

PR Description

This PR introduces a comprehensive enhancement to the NeMo Guardrails tracing and telemetry infrastructure, providing improved observability, standardized telemetry formats, and privacy controls for production deployments.

High-Level Impact

Strategic Goals

  • Standardization: Align with OpenTelemetry semantic conventions for GenAI applications
  • Flexibility: Support multiple span formats to maintain backward compatibality
  • Privacy: Implement controls to protect sensitive prompt/response data
  • Observability: Enhanced tracking of LLM interactions and guardrail executions, fixed span timestamps and relationshipts etc.

Key Features

1. Configurable Span Formats

  • Flat Format: Lightweight, backward-compatible format with minimal overhead
  • OpenTelemetry Format: Full compliance with OTel semantic conventions for GenAI
  • Configuration via span_format field in TracingConfig

2. Privacy First Content Capture

  • New enable_content_capture flag to control prompt/response recording
  • Defaults to disabled for privacy protection
  • Granular control over what gets recorded in telemetry

3. Enhanced LLM Provider Tracking

  • Added model_name and model_provider parameters to LLM calls
  • Improved attribution and debugging capabilities
  • Better support for multi-provider deployments

Technical Implementation

Architecture Changes

New Core Components

  • SpanFormat Enum: Type-safe span format definitions
  • SpanExtractor Hierarchy: V1 (flat) and V2 (OTel) extractors
  • Span Models: Typed span representations (InteractionSpan, RailSpan, ActionSpan, LLMSpan)
  • OTel Constants: Comprehensive semantic convention attributes

Enhanced Components

  • Tracer: Now accepts span format and content capture configuration
  • OpenTelemetryAdapter: Refactored for better v2 span handling

Data Flow

Generation Log → SpanExtractor → Typed Spans → Adapter → Telemetry Backend
                     ↓               ↓
                 (v1 or v2)    (flat or OTel)

OpenTelemetry Semantic Conventions

Implements GenAI semantic conventions including:

  • gen_ai.request.model
  • gen_ai.request.max_tokens
  • gen_ai.response.finish_reasons
  • gen_ai.usage.input_tokens
  • gen_ai.usage.output_tokens
  • Custom guardrails attributes under guardrails.* namespace

Breaking Changes

None - Full backward compatibility maintained

Migration Guide

Existing deployments will continue to work without changes. To adopt new features:

# Enable OpenTelemetry format with content capture
config = RailsConfig(
    tracing={
        "span_format": "opentelemetry",  # or "flat" for legacy
        "enable_content_capture": True    # defaults to False
    }
)

Testing

  • ✅ 7 new test files with comprehensive coverage
  • ✅ Tests for both v1 and v2 span formats
  • ✅ OTel semantic convention compliance validation
  • ✅ Mixed v1/v2 span handling
  • ✅ Format conversion and validation

…and configurable span formats

Introduces a major enhancement to the NeMo Guardrails tracing and telemetry infrastructure with support for multiple span formats, OpenTelemetry semantic convention compliance, and privacy-focused content capture controls. The system now supports both flat (legacy) and OpenTelemetry-compliant span formats while maintaining backward compatibility.

Key changes:
- Add configurable span format support (flat/opentelemetry)
- Implement OpenTelemetry semantic conventions for GenAI
- Add privacy controls for prompt/response content capture
- Enhance LLM call tracking with model provider information
- Improve span extraction and modeling architecture
- Add comprehensive test coverage for new functionality
@Pouyanpi Pouyanpi changed the title feat: enhance tracing system with OpenTelemetry semantic conventions … feat: enhance tracing system with OpenTelemetry semantic conventions for GenAI Aug 14, 2025
@Pouyanpi Pouyanpi changed the title feat: enhance tracing system with OpenTelemetry semantic conventions for GenAI feat: enhance tracing system with OpenTelemetry semantic conventions Aug 14, 2025
@Pouyanpi Pouyanpi self-assigned this Aug 14, 2025
@Pouyanpi Pouyanpi added bug Something isn't working enhancement New feature or request labels Aug 14, 2025
@Pouyanpi Pouyanpi added this to the v0.16.0 milestone Aug 14, 2025
@codecov-commenter
Copy link

codecov-commenter commented Aug 14, 2025

Codecov Report

❌ Patch coverage is 95.49356% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.09%. Comparing base (6ba7832) to head (7b7f0e5).
⚠️ Report is 3 commits behind head on develop.

Files with missing lines Patch % Lines
nemoguardrails/tracing/spans.py 92.90% 10 Missing ⚠️
nemoguardrails/tracing/span_extractors.py 93.47% 9 Missing ⚠️
nemoguardrails/actions/llm/utils.py 85.71% 1 Missing ⚠️
nemoguardrails/rails/llm/llmrails.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1331      +/-   ##
===========================================
+ Coverage    70.58%   71.09%   +0.50%     
===========================================
  Files          161      166       +5     
  Lines        16291    16722     +431     
===========================================
+ Hits         11499    11888     +389     
- Misses        4792     4834      +42     
Flag Coverage Δ
python 71.09% <95.49%> (+0.50%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
nemoguardrails/logging/explain.py 60.00% <100.00%> (+1.17%) ⬆️
nemoguardrails/rails/llm/config.py 90.37% <100.00%> (+0.03%) ⬆️
nemoguardrails/tracing/__init__.py 100.00% <100.00%> (ø)
nemoguardrails/tracing/adapters/base.py 70.58% <100.00%> (ø)
nemoguardrails/tracing/adapters/opentelemetry.py 95.83% <100.00%> (+1.95%) ⬆️
nemoguardrails/tracing/constants.py 100.00% <100.00%> (ø)
nemoguardrails/tracing/interaction_types.py 100.00% <100.00%> (ø)
nemoguardrails/tracing/span_format.py 100.00% <100.00%> (ø)
nemoguardrails/tracing/tracer.py 100.00% <100.00%> (ø)
nemoguardrails/actions/llm/utils.py 80.00% <85.71%> (+0.06%) ⬆️
... and 3 more

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Replace isinstance(span, TypedSpan) with explicit tuple of types
- TypedSpan is a Union type which cannot be used with isinstance in Python 3.9
- Update test to check for specific LLMSpan type instead of Union
- Fixes TypeError: Subscripted generics cannot be used with class and instance checks
@Pouyanpi Pouyanpi requested a review from tgasser-nv August 15, 2025 14:35
@Pouyanpi Pouyanpi marked this pull request as ready for review August 15, 2025 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants