Skip to content

Conversation

@dashpole
Copy link
Contributor

@dashpole dashpole commented Oct 2, 2025

This improves the concurrent performance of the histogram reservoir's Offer function by 4x (i.e. 75% reduction).

Accomplish this by locking each measurement, rather than locking around the entire storage. Also, defer extracting the trace context from context.Context until collection time. This improves the performance of Offer, which is on the measure hot path. Exemplars are often overwritten, so deferring the operation until Collect reduces the overall work.

goos: linux
goarch: amd64
pkg: go.opentelemetry.io/otel/sdk/metric/exemplar
cpu: AMD EPYC 7B12
                           │   main.txt   │              hist.txt              │
                           │    sec/op    │   sec/op     vs base               │
FixedSizeReservoirOffer-24    211.4n ± 3%   177.5n ± 3%  -16.04% (p=0.002 n=6)
HistogramReservoirOffer-24   200.85n ± 2%   47.41n ± 2%  -76.40% (p=0.002 n=6)
geomean                       206.1n        91.73n       -55.48%

                           │   main.txt   │              hist.txt              │
                           │     B/op     │    B/op     vs base                │
FixedSizeReservoirOffer-24   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
HistogramReservoirOffer-24   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
geomean                                 ²               +0.00%               ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                           │   main.txt   │              hist.txt              │
                           │  allocs/op   │ allocs/op   vs base                │
FixedSizeReservoirOffer-24   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
HistogramReservoirOffer-24   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
geomean                                 ²               +0.00%               ²
¹ all samples are equal
² summaries must be >0 to compute geomean

I explored using a []atomic.Pointer[measurement], but this had similar performance while being much more complex (needing a sync.Pool to eliminate allocations). The single-threaded performance was also much worse for that solution. See main...dashpole:optimize_histogram_reservoir_old.

@codecov
Copy link

codecov bot commented Oct 2, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.2%. Comparing base (9dea78c) to head (7d0f036).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #7443     +/-   ##
=======================================
- Coverage   86.2%   86.2%   -0.1%     
=======================================
  Files        295     295             
  Lines      25864   25863      -1     
=======================================
- Hits       22307   22303      -4     
- Misses      3184    3187      +3     
  Partials     373     373             
Files with missing lines Coverage Δ
sdk/metric/exemplar/fixed_size_reservoir.go 97.6% <100.0%> (ø)
sdk/metric/exemplar/histogram_reservoir.go 92.0% <100.0%> (-1.4%) ⬇️
sdk/metric/exemplar/storage.go 100.0% <100.0%> (ø)

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dashpole dashpole force-pushed the optimize_histogram_reservoir branch 2 times, most recently from 7457c73 to 7c1476f Compare October 4, 2025 03:47
@dashpole dashpole force-pushed the optimize_histogram_reservoir branch from 7c1476f to 7b79e43 Compare October 7, 2025 14:40
dashpole added a commit that referenced this pull request Oct 7, 2025
Forked from this discussion here:
#7443 (comment)

It seems like a good idea for us as a group to align on and document
what we are comfortable with in terms of how ordered measurements are
reflected in collected metric data.

---------

Co-authored-by: Tyler Yahn <[email protected]>
@pellared pellared mentioned this pull request Oct 10, 2025
@bboreham
Copy link
Contributor

On further reflection, I fixed the copying issue before running the benchmark, so it is perhaps reasonable that less racy code runs slower.

Would be good if the tests and/or linter detected the issue. I note that NoCopy was removed from atomic.Value here: golang/go#21504.

@dashpole dashpole force-pushed the optimize_histogram_reservoir branch from 433ff16 to e4dfbac Compare October 15, 2025 01:01
@dashpole
Copy link
Contributor Author

I also see slightly worse results, but agree it is definitely better to be correct. I'll work on a test.

@dashpole dashpole force-pushed the optimize_histogram_reservoir branch from e4dfbac to 67df837 Compare October 15, 2025 15:38
@dashpole
Copy link
Contributor Author

I added a ConcurrentSafe test, and verified that it fails (quite spectacularly) with the previous atomic.Value implementation.

Copy link
Contributor

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@dashpole
Copy link
Contributor Author

The concurrent safe test found another race condition around my usage of sync.Pool, which i'm looking into

@dashpole dashpole force-pushed the optimize_histogram_reservoir branch from 597d23c to 81231b8 Compare October 15, 2025 20:00
@dashpole dashpole force-pushed the optimize_histogram_reservoir branch from 81231b8 to 2c82611 Compare October 15, 2025 20:02
@dashpole
Copy link
Contributor Author

The other race had to do with my usage of sync.Pool. After Collect loaded an element, that element could be placed into the sync.Pool by a subsequent store() that replaced the measurement, and then modified by another store() that retrieved it from the sync.Pool. I worked out a way to fix this, but it made the performance around ~45ns. In the end, I decided to just lock around each measurement, since that has the same parallel performance, is much more simple and readable, and has better single-threaded performance.

@dashpole dashpole requested review from MrAlias and bboreham October 15, 2025 20:08
@dashpole dashpole force-pushed the optimize_histogram_reservoir branch from 2c82611 to 5e17e43 Compare October 15, 2025 20:12
Copy link
Contributor

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much simpler now.


r.mu.Lock()
defer r.mu.Unlock()
if int(r.count) < cap(r.measurements) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive-by: I think this (and all similar code) should be len not cap.
In the current code they are always the same, but it's a slight jar when reading it to wonder what was intended.

@MrAlias MrAlias added this to the v1.39.0 milestone Oct 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants