Skip to content

feat: add metrics to the api crate #257

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Aug 14, 2025
Merged

feat: add metrics to the api crate #257

merged 8 commits into from
Aug 14, 2025

Conversation

imor
Copy link
Contributor

@imor imor commented Aug 12, 2025

This PR returns metrics from the api crate on the /metrics endpoint. These metrics will be scraped from this endpoint by victoria metrics. Following metrics are currently returned:

  • http_requests_total (labels: endpoint, method, status): the total number of HTTP requests handled by the api.
  • http_requests_duration_seconds (labels: endpoint, method, status): the request duration for all HTTP requests handled by the api.

More metrics will be added later.

@imor imor changed the title Rs/api metrics feat: add metrics to the api crate Aug 14, 2025
@imor imor marked this pull request as ready for review August 14, 2025 06:25
@imor imor requested a review from a team as a code owner August 14, 2025 06:25
Copy link
Contributor

@iambriccardo iambriccardo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comments, but not needed for merge.

How are we planning on extracting metrics from replicator instances? Will we add a /metrics endpoint to them?

.expect("Failed to execute request.");

// Assert
assert!(response.status().is_success());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also validate that there are some metrics in here (if doable)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might need to parse the metrics for validation. The timing values returned in the result make it a bit non-deterministic which will need to be skipped. I'll see if it's easy enough and add it, otherwise keep it as is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I guessed the non-determinism. If you can do it's great, otherwise no problem!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just took a look at this and the non-determinism is not limited to just the times returned in the metrics. Since the metrics recorder is installed globally, and while the tests are running in parallel, the recorder keeps recording the endpoints being hit. Due to this it becomes extremely challenging to guess what the expected metrics output would be. This probably could be done by setting a per thread recorder just for tests, but that is extra complexity for very limited benefit. Due to these reasons, leaving it as is for now.

.wrap(tracing_logger)
.service(health_check)
.service(metrics)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that creating an endpoint manually doesn't conflict with the default endpoint created?

Just asking since I saw these docs: https://crates.io/crates/actix-web-metrics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default endpoint is created only when we call PrometheusBuilder::install method. We are calling the PrometheusBuilder::install_recorder method instead. See docs: https://docs.rs/metrics-exporter-prometheus/latest/metrics_exporter_prometheus/.

@imor
Copy link
Contributor Author

imor commented Aug 14, 2025

How are we planning on extracting metrics from replicator instances? Will we add a /metrics endpoint to them?

We have two options here:

  • Have a /metrics endpoint. A bit of a challenge with this approach is that replicator instances are not as stable as API pods which makes discovering their network address tricky. A custom k8s controller would have helped here, but we'd need to find an alternative solution without the k8s controller for now. We could e.g. have an API endpoint return currently running replicators for service discovery.
  • Push metrics to victoria metrics. This way we don't need to worry about service discovery and every replicator pushes metrics to a stable victoria metrics address.

@imor imor merged commit ac6099d into main Aug 14, 2025
5 checks passed
@imor imor deleted the rs/api-metrics branch August 14, 2025 08:04
Copy link
Contributor

Yeah, I would go for option 2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants