Skip to content

Conversation

@kushudai
Copy link

@kushudai kushudai commented Oct 16, 2023

This is mostly a reimplementation of #469 updated to address the main comment.

The previous use case was fine using i64 as it was tracking queue sizes which were integral. However, we've noticed noise in our latency histograms. Since a histogram_quantile of 1 is merely an approximation (or an extrapolation, in case of not enough data points), we've seen "max" metrics skew heavily towards the the largest bucket.
We know these are not real because the server side timeouts are much smaller than the largest bucket but a bit bigger than the second largest bucket.
A similar but separate problem is not having enough 9s for systems that serve hundreds of thousands of RPS - P99 does not accurately reflect tail latencies for these and adjusting the charts on a per use case basis is painful busywork.

Instead of finely tuning buckets for different latency histogram metrics, we'd like to be able to report the maximum latency observed for a given time period (this is usually the scraping interval).
This allows us to put a cap on maximum latency seen on server side processing which then allows to accurately attribute network latency as seen by clients.

@kushudai kushudai force-pushed the max-over-interval-gauge branch from c3d1811 to 3c5350d Compare October 16, 2023 08:18
Signed-off-by: Kushagra Udai <[email protected]>
@kushudai kushudai force-pushed the max-over-interval-gauge branch from cad7cfa to 96d1676 Compare October 16, 2023 08:21
@kushudai kushudai changed the title Add a maximum over interval gauge Add MaximumOverIntervalGauge Oct 16, 2023
@kushudai kushudai marked this pull request as ready for review October 16, 2023 17:48
@kushudai
Copy link
Author

Hi @lucab, given that you reviewed the original PR, I was hoping you could take a look at this.
Thank you!

@kushudai kushudai marked this pull request as draft October 16, 2023 20:37
Signed-off-by: Kushagra Udai <[email protected]>
@kushudai kushudai force-pushed the max-over-interval-gauge branch from 1cdf904 to d1fa533 Compare October 17, 2023 02:58
@kushudai kushudai marked this pull request as ready for review October 17, 2023 02:58
@kushudai
Copy link
Author

Hi @lucab, I apologize for the nag but I was hoping you could take a look at this one :)

@kushudai kushudai closed this Feb 22, 2025
@kushudai kushudai deleted the max-over-interval-gauge branch February 22, 2025 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant