Skip to content

Feature request: native OpenTelemetry instrumentation (OTLP tracing + metrics push) #1194

@tnucera

Description

@tnucera

Problem

Mercure currently exposes observability through Caddy's built-in capabilities:

  • Metrics: Prometheus scrape endpoint (/metrics) with 3 Mercure-specific counters/gauges
  • Tracing: Caddy's tracing directive provides HTTP request-level spans

While this is useful for basic monitoring, it falls short for production environments that rely on OpenTelemetry-native observability stacks:

  1. No application-level tracing: There are no spans for Mercure's internal operations — publishing a message, dispatching to subscribers, transport/storage interactions (Bolt, Redis, etc.). The only spans available are generic HTTP request spans from Caddy, which don't provide insight into Mercure's business logic.

  2. No OTLP push for metrics: Metrics are only available via Prometheus scrape. In environments using an OTLP collector (e.g., Grafana Alloy, OpenTelemetry Collector), this requires running an additional scrape job rather than having Mercure push metrics directly via OTLP (gRPC or HTTP).

Caddy itself already supports native OTLP tracing via its tracing directive, which provides HTTP request-level spans. However, the Mercure module doesn't leverage this to instrument its own internal operations, so operators only get generic HTTP spans without any insight into Mercure's publish/subscribe logic.

Proposed solution

1. Application-level tracing (OTLP export)

Instrument Mercure's core operations with OpenTelemetry spans:

  • Publish flow: span covering message validation, authorization, and dispatch to the transport layer
  • Subscribe flow: span for subscription setup, authorization, and event delivery
  • Transport operations: spans for Bolt/Redis/Postgres read/write operations

This would allow operators to trace a message from publication through to subscriber delivery, which is critical for debugging latency issues and understanding system behavior under load.

Configuration could follow Caddy's existing pattern and/or rely on standard OTEL_* environment variables (OTEL_EXPORTER_OTLP_TRACES_ENDPOINT, OTEL_SERVICE_NAME, etc.).

2. OTLP push for metrics

Allow Mercure's existing metrics (mercure_subscribers_total, mercure_subscribers_connected, mercure_updates_total) to be exported via OTLP push (gRPC/HTTP) in addition to the current Prometheus scrape endpoint.

This would enable push-based metric collection without requiring a Prometheus scrape job, which simplifies deployment in OTLP-native environments.

Again, configuration via standard OTEL_* environment variables would be ideal (OTEL_EXPORTER_OTLP_METRICS_ENDPOINT, OTEL_METRICS_EXPORTER, etc.).

Additional context

  • The OTel Go SDK dependencies are already present in go.mod (via Caddy), so this wouldn't introduce major new dependencies
  • I saw #901 was closed — but it didn't articulate the specific gap between HTTP-level Caddy observability and application-level Mercure instrumentation, which is what this request focuses on
  • Caddy already supports OTLP tracing natively — extending this instrumentation into the Mercure module would be a natural evolution rather than a paradigm shift

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions