Skip to content

feat(telemetry): expose OTel metrics via Prometheus /v1/metrics endpoint#6034

Draft
cdoern wants to merge 1 commit into
ogx-ai:mainfrom
cdoern:rhaieng-5156-prometheus-metrics
Draft

feat(telemetry): expose OTel metrics via Prometheus /v1/metrics endpoint#6034
cdoern wants to merge 1 commit into
ogx-ai:mainfrom
cdoern:rhaieng-5156-prometheus-metrics

Conversation

@cdoern

@cdoern cdoern commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

What

Adds an optional Prometheus scrape endpoint at /v1/metrics that exposes the existing OTel metrics in Prometheus exposition format, alongside the current OTLP push path. This unblocks scrape-based monitoring systems that need a Prometheus-compatible endpoint rather than OTLP push.

Resolves RHAIENG-5156.

How

  • Dependency: add opentelemetry-exporter-prometheus (pulls in prometheus-client).
  • setup_telemetry() (src/ogx/telemetry/__init__.py): now builds a list of metric readers — the existing OTLP PeriodicExportingMetricReader when OTEL_EXPORTER_OTLP_ENDPOINT is set, plus a PrometheusMetricReader when OGX_PROMETHEUS_ENABLED is truthy. Both attach to a single global MeterProvider, so the two export paths run independently.
  • Endpoint (src/ogx_api/inspect_api/fastapi_routes.py): GET /v1/metrics is declared on the Inspect API router alongside /v1/health and /v1/version. It serves prometheus_client.generate_latest() with the Prometheus content type, opts out of auth via PUBLIC_ROUTE_KEY, and returns 404 when the feature is disabled.
  • Middleware exclusions: /v1/metrics is added to _EXCLUDED_PATHS in RequestMetricsMiddleware so scrapes don't inflate request counters. Auth is handled automatically by PUBLIC_ROUTE_KEY (no manual bypass needed, since the route is a registered API route).

Acceptance criteria

  • When enabled, OGX exposes all existing OTel metrics at /v1/metrics in Prometheus exposition format
  • The endpoint does not require authentication
  • Enabled/disabled via OGX_PROMETHEUS_ENABLED
  • OTLP push continues to work independently (both paths active simultaneously)
  • Existing unit and integration tests continue to pass
  • New unit test validates Prometheus format output

Test plan

Unit tests (tests/unit/telemetry/test_prometheus_metrics.py, 17 cases): env-flag parsing, Prometheus-format exposition with labels/values, the route via TestClient (200 + text/plain when enabled, 404 when disabled), PUBLIC_ROUTE_KEY presence, and the _EXCLUDED_PATHS guard.

uv run pytest tests/unit/telemetry/ -q
# 17 passed

Integration tests (tests/integration/inspect/test_metrics_endpoint.py, server mode): scrape the live /v1/metrics over raw HTTP, assert Prometheus format and ogx_requests_total, no-auth access, and that the metrics route is not self-counted. scripts/integration-tests.sh sets OGX_PROMETHEUS_ENABLED for native server-mode runs; the tests skip otherwise.

uv run --no-sync ./scripts/integration-tests.sh \
  --stack-config server:ci-tests --setup gpt \
  --file tests/integration/inspect/test_metrics_endpoint.py
# 3 passed

Manual run against a live server confirmed 200 + Content-Type: text/plain; version=1.0.0; charset=utf-8, ogx_requests_total / ogx_request_duration_seconds present, scrapes excluded from counters, and 404 when the flag is unset.

🤖 Generated with Claude Code

OGX previously exported metrics only through OTLP push to an OTel Collector.
This adds an optional Prometheus scrape endpoint so scrape-based monitoring
systems can collect the existing metrics.

When OGX_PROMETHEUS_ENABLED is set, setup_telemetry() attaches a
PrometheusMetricReader to the MeterProvider alongside the existing OTLP reader,
and the Inspect API serves all metrics at /v1/metrics in Prometheus exposition
format. The endpoint opts out of authentication via PUBLIC_ROUTE_KEY, returns
404 when disabled, and is excluded from RequestMetricsMiddleware. The OTLP push
path continues to work independently, so both export paths can run at once.

Adds unit tests covering the Prometheus format and endpoint behavior, and a
server-mode integration test that scrapes the live endpoint.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Charlie Doern <cdoern@redhat.com>
@cdoern cdoern force-pushed the rhaieng-5156-prometheus-metrics branch from 838dca0 to e451304 Compare June 4, 2026 19:59

@rhdedgar rhdedgar left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this covers all the criteria from RHAIENG-5156. +1

@mergify

mergify Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be merged. @cdoern please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants