fix: Explore how to achieve telemetry suppression with OTLP by cijothomas · Pull Request #3084 · open-telemetry/opentelemetry-rust

cijothomas · 2025-07-25T16:37:15Z

One way of addressing #2877

This PR does not introduce a “fix” inside the OTLP Exporters themselves, but instead demonstrates how users can address the issue without requiring changes in OpenTelemetry.

Background

OpenTelemetry provides a mechanism to suppress telemetry based on the current Context. However, this suppression only works if every component involved properly propagates OpenTelemetry’s Context. Libraries like tonic and hyper are not aware of OTel’s Context and therefore do not propagate it across threads.

As a result, OTel’s suppression can fail, leading to telemetry-induced-telemetry—where the act of exporting telemetry (e.g., sending data via tonic/hyper) itself generates additional telemetry. This newly generated telemetry is then exported again, triggering yet more telemetry in a loop, potentially overwhelming the system.

What this PR does

OTLP/gRPC exporters rely on the tonic client, which captures the current runtime at creation time and uses it to drive futures. Instead of reusing the application’s existing runtime, this PR creates a dedicated Tokio runtime exclusively for the OTLP Exporter.

In this dedicated runtime:
1. We intercept on_start / on_stop events.
2. Sets OTel’s suppression flag in the context.

This ensures that telemetry generated by libraries such as hyper/tonic will be suppressed only within the exporter’s dedicated runtime. If those same libraries are used elsewhere for application logic, they continue to function normally and emit telemetry as expected.

Depending on the feedback, we could either address this purely through documentation and examples, or we could enhance the OTLP Exporter itself to expose a feature flag that, when enabled, would automatically create the tonic client within its own dedicated runtime.

cijothomas · 2025-07-25T16:41:45Z

opentelemetry-otlp/examples/basic-otlp/src/main.rs

+    static SUPPRESS_GUARD: RefCell<Option<opentelemetry::ContextGuard>> = const { RefCell::new(None) };
+}
+
+// #[tokio::main]


If user's application relies on tokio, then they can create a Rt themselves, and wrap their main method inside rt.blockon({...app code..}), ensuring telemetry initialization is outside of it.

Copilot

Pull Request Overview

This PR demonstrates how to suppress telemetry-induced-telemetry loops in OTLP exporters by creating a dedicated Tokio runtime with telemetry suppression enabled. The approach prevents the infinite telemetry generation that occurs when OpenTelemetry exporters using tonic/hyper generate their own telemetry data, which then gets exported again in a loop.

Creates a dedicated Tokio runtime with thread start/stop hooks that enable telemetry suppression
Removes manual log filtering that was previously suppressing hyper/tonic logs globally
Moves all OpenTelemetry initialization to use the dedicated runtime

Copilot · 2025-07-25T16:42:41Z

opentelemetry-otlp/examples/basic-otlp/src/main.rs

+    static SUPPRESS_GUARD: RefCell<Option<opentelemetry::ContextGuard>> = const { RefCell::new(None) };
+}
+
+// #[tokio::main]


[nitpick] Remove the commented out #[tokio::main] attribute as it's no longer needed and adds unnecessary clutter to the code.

Suggested change

// #[tokio::main]

Copilot · 2025-07-25T16:42:41Z

opentelemetry-otlp/examples/basic-otlp/src/main.rs

+        .worker_threads(1) // Don't think this matters as no matter how many threads
+        // are created, we intercept the thread start to set suppress guard.


[nitpick] The comment spans multiple lines but uses single-line comment syntax. Consider using proper multi-line comment format or clarify the reasoning more concisely in a single line.

Suggested change

.worker_threads(1) // Don't think this matters as no matter how many threads

// are created, we intercept the thread start to set suppress guard.

.worker_threads(1) /* Don't think this matters as no matter how many threads

are created, we intercept the thread start to set suppress guard. */

Copilot · 2025-07-25T16:42:41Z

opentelemetry-otlp/examples/basic-otlp/src/main.rs

+        })
+        .build()
+        .expect("Failed to create tokio runtime");
+    let logger_provider = rt.block_on(async { init_logs() });


The init_logs() function is not async but is being wrapped in an async block unnecessarily. Consider calling it directly: let logger_provider = init_logs();

Suggested change

let logger_provider = rt.block_on(async { init_logs() });

let logger_provider = init_logs();

Copilot · 2025-07-25T16:42:42Z

opentelemetry-otlp/examples/basic-otlp/src/main.rs

    // allow internal-logs from Tracing/Metrics initializer to be captured.

-    let tracer_provider = init_traces();
+    let tracer_provider = rt.block_on(async { init_traces() });


The init_traces() function is not async but is being wrapped in an async block unnecessarily. Consider calling it directly: let tracer_provider = init_traces();

Suggested change

let tracer_provider = rt.block_on(async { init_traces() });

let tracer_provider = init_traces();

Copilot · 2025-07-25T16:42:42Z

opentelemetry-otlp/examples/basic-otlp/src/main.rs

    global::set_tracer_provider(tracer_provider.clone());

-    let meter_provider = init_metrics();
+    let meter_provider = rt.block_on(async { init_metrics() });


The init_metrics() function is not async but is being wrapped in an async block unnecessarily. Consider calling it directly: let meter_provider = init_metrics();

Suggested change

let meter_provider = rt.block_on(async { init_metrics() });

let meter_provider = init_metrics();

codecov · 2025-07-25T16:43:07Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.1%. Comparing base (e9ca158) to head (54e05d5).

Additional details and impacted files

@@          Coverage Diff          @@
##            main   #3084   +/-   ##
=====================================
  Coverage   80.1%   80.1%           
=====================================
  Files        126     126           
  Lines      21957   21957           
=====================================
  Hits       17603   17603           
  Misses      4354    4354

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

scottgerring · 2025-07-29T11:17:53Z

Although I can see that this works, it feels like a big leak of impl details into the user's domain. What would it look like as a helper in OTel itself? E.g. something like withTelemetrySuppression(_ => { /* setup otel here */ } ) ?

I expect this would still require the user to not use a tokio_main but rather explicitly create their runtime after setting up OTel using this helper to wrap, but this would still go a ways to make it feel a bit less leaky.

My other question is - do we impact any of our users by requiring a separate tokio runtime in this case, for instance, folks in resource-constrained environments?

scottgerring · 2025-07-29T11:42:45Z

It feels like our first suggestion to users should be:

If you're not using tonic/hyper as a HTTP client in your app, simply tune your tracing subscriber to suppress their telemetry

Then this is is only necessary for a subset of users

cijothomas · 2025-08-02T02:14:41Z

Although I can see that this works, it feels like a big leak of impl details into the user's domain. What would it look like as a helper in OTel itself? E.g. something like withTelemetrySuppression(_ => { /* setup otel here */ } ) ?

I expect this would still require the user to not use a tokio_main but rather explicitly create their runtime after setting up OTel using this helper to wrap, but this would still go a ways to make it feel a bit less leaky.

My other question is - do we impact any of our users by requiring a separate tokio runtime in this case, for instance, folks in resource-constrained environments?

Good point. This is already the case even without this PR! See https://github.com/open-telemetry/opentelemetry-rust/blob/main/opentelemetry-otlp/src/lib.rs#L112-L113
OTLP/gRPC Exporter already requires a tokio runtime - either it captures current one if user has tokio::main, or we explicitly ask users to create an RT and do OTLP instantiation inside it.

Exposing a helper/feature in OTLP Exporter bloats public API, and it'll be less flexible than users giving a runtime to us. (they can do other things inside thread_start/stop apart from just the suppression etc.)

At some point in the future, we could work with tokio-tracing maintainers and see if we can agree on a mutual Context field for suppression, but this requires a lot of research and co-ordination. The approach shown in this PR is just a way for users to unblock themselves right now, without OTel/OTLP doing anything extra.

cijothomas · 2025-08-02T02:19:07Z

My other question is - do we impact any of our users by requiring a separate tokio runtime in this case, for instance, folks in resource-constrained environments?

Quite valid point! It is not mandatory to use separate tokio runtime - it is only required if users are not okay with the filtering the logs from hyper/tonic etc, and want to do it only when originating from otlp export context. But if user needs that capability, then asking them to create another runtime will strain resources, but not too much - it's just one thread, which is sitting idle 99% of time. We do have such concerns already with our BatchProcessor/PeriodicReader - they all by default creates a separate thread instead of plugging into user's existing runtime, though users can avoid it by opting into currently experimental features.

cijothomas · 2025-08-12T16:29:14Z

opentelemetry-otlp/examples/basic-otlp/src/main.rs

+    static SUPPRESS_GUARD: RefCell<Option<opentelemetry::ContextGuard>> = const { RefCell::new(None) };
+}
+
+// #[tokio::main]


can we ask tokio::main macro's to provide on_start/on_end callback..similar to the way they offer on_panic..?

cijothomas · 2025-08-12T16:34:58Z

opentelemetry-otlp/examples/basic-otlp/src/main.rs

+    static SUPPRESS_GUARD: RefCell<Option<opentelemetry::ContextGuard>> = const { RefCell::new(None) };
+}
+
+// #[tokio::main]


todo: cleanup http one and confirm it does not need this technique by default.
todo: see if the client authors offer a way to opt-out of telemetry.

cijothomas · 2025-08-12T16:36:00Z

opentelemetry-otlp/examples/basic-otlp/src/main.rs

+
+// #[tokio::main]
+fn main() -> Result<(), Box<dyn Error + Send + Sync + 'static>> {
+    let rt = tokio::runtime::Builder::new_multi_thread()


check if this can be wrapped inside std::thread

scottgerring · 2025-08-13T06:15:27Z

Exposing a helper/feature in OTLP Exporter bloats public API, and it'll be less flexible than users giving a runtime to us. (they can do other things inside thread_start/stop apart from just the suppression etc.)
...
At some point in the future, we could work with tokio-tracing maintainers and see if we can agree on a mutual Context field for suppression, but this requires a lot of research and co-ordination. The approach shown in this PR is just a way for users to unblock themselves right now, without OTel/OTLP doing anything extra.

I reckon if we reasonably expect to be able to agree on a suppression mechanism in the future it makes sense to not extend the public API for now, although I have no concept of how big this effort would be!

It is not mandatory to use separate tokio runtime - it is only required if users are not okay with the filtering the logs from hyper/tonic etc, and want to do it only when originating from otlp export context

Good point - regular filtering is the "default" and this is an opt-in thing for folks who want to selectively keep some http client logging.

cijothomas · 2025-08-13T15:34:09Z

I reckon if we reasonably expect to be able to agree on a suppression mechanism in the future it makes sense to not extend the public API for now, although I have no concept of how big this effort would be!

A more universally agreed concept of Context would be nice, but it'll require lot of work to drive something like that.
Another alternative is for the clients we use (tonic/hyper etc) to expose a way to opt-out to their usual logging, and then OTLP opting out this way. That'd also require some efforts to drive this across the clients we use!

davidhewitt · 2025-08-20T16:20:27Z

For what it's worth, I have been able to use this approach downstream in logfire to successfully suppress all export telemetry.

To avoid reqwest spawning a background thread outside of my control, I had to switch to use the reqwest-client (async client) in opentelemetry-otlp.
Due to that client needing a tokio runtime, I decided to just spawn a background tokio runtime inside the logfire SDK for the exporters. Using the approach here I suppress all telemetry on that runtime's threads.
... and similarly I needed to use the experimental async batch exporters, because the async reqwest client doesn't work in the background thread of the sync BatchSpanExporter (etc) because those threads don't have a tokio context.

pydantic/logfire-rust#95

bryantbiggs · 2026-02-20T17:39:16Z

While reviewing the telemetry suppression approach here, I noticed that SimpleConcurrentLogProcessor::emit() in opentelemetry-sdk/src/logs/concurrent_log_processor.rs does not enter a telemetry-suppressed scope before calling the exporter, unlike every other processor in the SDK:

SimpleLogProcessor::emit() has let _suppress_guard = Context::enter_telemetry_suppressed_scope();
BatchLogProcessor suppresses in both its export and batch-processing paths
SimpleConcurrentLogProcessor::emit() calls futures_executor::block_on(self.exporter.export(...)) with no suppression at all

If an exporter (or its underlying transport) emits logs during export, and those logs flow back through a SimpleConcurrentLogProcessor, there is nothing to break the recursion.

This is likely a straightforward fix -- adding let _suppress_guard = Context::enter_telemetry_suppressed_scope(); at the top of SimpleConcurrentLogProcessor::emit(), consistent with how SimpleLogProcessor does it. Might be worth folding into this PR or a follow-up.

michaelvanstraten · 2026-02-21T15:12:18Z

I tried what was suggested in this PR; here are my notes.

Why this won't work

This won’t work if your application installs/uses its own futures runtime, because the standard processor ultimately calls futures_executor::block_on

opentelemetry-rust/opentelemetry-sdk/src/trace/span_processor.rs

Line 181 in 09b85b5

.and_then(|exporter| futures_executor::block_on(exporter.export(vec![span])));

That ends up using whatever runtime is currently active, which is exactly what you don’t want, because the suppression guard is not set there.

You can address the block_on issue by switching to the experimental processors that accept an async runtime, like the ones defined here:

opentelemetry-rust/opentelemetry-sdk/src/trace/span_processor_with_async_runtime.rs

Line 85 in 09b85b5

pub struct BatchSpanProcessor<R: RuntimeChannel> {

However, you still need to provide your own runtime implementation (e.g., a thin wrapper around a Tokio runtime).

The deeper issue

The bigger issue I ran into is that h2 creates a parentless span when it starts a new connection here

https://github.com/hyperium/h2/blob/5634dddea8ff9ed4e8df327a64765738f3e997d8/src/proto/connection.rs#L129

That defeats the global suppression guard because it gets ignored by tracing-opentelemetry here:

https://github.com/tokio-rs/tracing-opentelemetry/blob/91c4faa0082a3bfe2b3e9e59ed1db295830f55ec/src/layer.rs#L949

when is_contextual returns false.

Workaround I ended up with

I ended up creating a dedicated “telemetry” Tokio runtime and adding a filter to the OTel layer that drops spans/events that originate from telemetry runtime threads. The idea is: just set a threadlocal that tells you if you are in a telemetry export.

Below is what I have.

use std::cell::Cell;
use std::fmt::Debug;
use std::ops::Deref;
use std::sync::LazyLock;

thread_local! {
    static IS_TELEMETRY_THREAD: Cell<bool> = const { Cell::new(false) };
}

static TELEMETRY_RUNTIME: LazyLock<tokio::runtime::Runtime> = LazyLock::new(|| {
    tokio::runtime::Builder::new_multi_thread()
        .worker_threads(1)
        .thread_name("telemetry-runtime")
        .enable_all()
        .on_thread_start(|| IS_TELEMETRY_THREAD.set(true))
        .on_thread_stop(|| IS_TELEMETRY_THREAD.set(false))
        .build()
        .expect("Failed to create tokio runtime")
});

#[derive(Debug, Clone)]
pub struct TelemetryRuntime;

impl opentelemetry_sdk::runtime::Runtime for TelemetryRuntime {
    fn spawn<F>(&self, future: F)
    where
        F: Future<Output = ()> + Send + 'static,
    {
        let _ = TELEMETRY_RUNTIME.spawn(future);
    }

    fn delay(&self, duration: std::time::Duration) -> impl Future<Output = ()> + Send + 'static {
        let _guard = TELEMETRY_RUNTIME.enter();
        tokio::time::sleep(duration)
    }
}

impl opentelemetry_sdk::runtime::RuntimeChannel for TelemetryRuntime {
    type Receiver<T: Debug + Send> = tokio_stream::wrappers::ReceiverStream<T>;
    type Sender<T: Debug + Send> = tokio::sync::mpsc::Sender<T>;

    fn batch_message_channel<T: std::fmt::Debug + Send>(
        &self,
        capacity: usize,
    ) -> (Self::Sender<T>, Self::Receiver<T>) {
        let _guard = TELEMETRY_RUNTIME.enter();
        let (sender, receiver) = tokio::sync::mpsc::channel(capacity);
        (
            sender,
            tokio_stream::wrappers::ReceiverStream::new(receiver),
        )
    }
}

impl<S> tracing_subscriber::layer::Filter<S> for TelemetryRuntime {
    fn enabled(
        &self,
        _: &tracing::Metadata<'_>,
        _: &tracing_subscriber::layer::Context<'_, S>,
    ) -> bool {
        !IS_TELEMETRY_THREAD.get()
    }

    fn event_enabled(
        &self,
        _: &tracing::Event<'_>,
        _: &tracing_subscriber::layer::Context<'_, S>,
    ) -> bool {
        !IS_TELEMETRY_THREAD.get()
    }
}

impl Deref for TelemetryRuntime {
    type Target = tokio::runtime::Runtime;

    fn deref(&self) -> &Self::Target {
        &*TELEMETRY_RUNTIME
    }
}

Example: wiring the telemetry runtime into the SDK + applying the filter

let _guard = TelemetryRuntime.enter();

let processor = BatchLogProcessor::builder(exporter, TelemetryRuntime).build();

let provider = SdkLoggerProvider::builder()
    .with_log_processor(processor)
    .build();

OpenTelemetryTracingBridge::new(&provider)
    .with_filter(TelemetryRuntime);

Fix: Explore how to achieve telemetry suppression with OTLP

34f8be8

cijothomas requested a review from a team as a code owner July 25, 2025 16:37

clippy

54e05d5

cijothomas changed the title ~~Fix: Explore how to achieve telemetry suppression with OTLP~~ fix: Explore how to achieve telemetry suppression with OTLP Jul 25, 2025

cijothomas commented Jul 25, 2025

View reviewed changes

cijothomas requested a review from Copilot July 25, 2025 16:41

Copilot AI reviewed Jul 25, 2025

View reviewed changes

cijothomas commented Aug 12, 2025

View reviewed changes

scottgerring mentioned this pull request Sep 17, 2025

[Bug]: Stack overflow when calling unfiltered tracing::event! after SdkLogProvider is shutdown. #3161

Closed

cijothomas mentioned this pull request Jan 30, 2026

Fix telemetry-induced-telemetry in OTLP Exporter #2877

Open

		.worker_threads(1) // Don't think this matters as no matter how many threads
		// are created, we intercept the thread start to set suppress guard.

	let logger_provider = rt.block_on(async { init_logs() });
	let logger_provider = init_logs();

	let tracer_provider = rt.block_on(async { init_traces() });
	let tracer_provider = init_traces();

	let meter_provider = rt.block_on(async { init_metrics() });
	let meter_provider = init_metrics();

Conversation

cijothomas commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cijothomas Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jul 25, 2025

Codecov Report

Uh oh!

scottgerring commented Jul 29, 2025

Uh oh!

scottgerring commented Jul 29, 2025

Uh oh!

cijothomas commented Aug 2, 2025

Uh oh!

cijothomas commented Aug 2, 2025

Uh oh!

cijothomas Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

cijothomas Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

cijothomas Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

scottgerring commented Aug 13, 2025

Uh oh!

cijothomas commented Aug 13, 2025

Uh oh!

davidhewitt commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bryantbiggs commented Feb 20, 2026

Uh oh!

michaelvanstraten commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why this won't work

The deeper issue

Workaround I ended up with

Example: wiring the telemetry runtime into the SDK + applying the filter

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

cijothomas commented Jul 25, 2025 •

edited

Loading

davidhewitt commented Aug 20, 2025 •

edited

Loading

michaelvanstraten commented Feb 21, 2026 •

edited

Loading