Skip to content

Aggregate counter downsampling preserves resets#143381

Merged
gmarouli merged 38 commits intoelastic:mainfrom
gmarouli:downsampling-counters-with-resets
Mar 13, 2026
Merged

Aggregate counter downsampling preserves resets#143381
gmarouli merged 38 commits intoelastic:mainfrom
gmarouli:downsampling-counters-with-resets

Conversation

@gmarouli
Copy link
Contributor

@gmarouli gmarouli commented Mar 2, 2026

In this PR we aim to improve the accuracy of the aggregate counter by the following changes:

  • The downsampled document will record the first and not the last value of the counter. This should improve accuracy because the first value is closer to the start of the bucket than the last value.
  • If we detect a reset, we track extra documents, the last value before the reset and, optionally, the value after the reset. These documents will preserve the original timestamps.

Our hypothesis is that with these two changes, we can have a more accurate counter estimation without a big performance regression (vefiried in #142280), assuming that reset events are rare and usually affect all counters at the same moment.

Closes #136178

@gmarouli gmarouli added >enhancement :StorageEngine/Downsampling Downsampling (replacement for rollups) - Turn fine-grained time-based data into coarser-grained data labels Mar 2, 2026
@gmarouli gmarouli marked this pull request as ready for review March 2, 2026 15:52
@gmarouli gmarouli requested review from felixbarny and kkrik-es March 2, 2026 15:52
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Collaborator

Hi @gmarouli, I've created a changelog YAML for you.

@kkrik-es kkrik-es requested review from martijnvg and removed request for felixbarny March 2, 2026 17:35
private final ExponentialHistogramFieldDownsampler[] exponentialHistogramDownsamplers;
private final TDigestHistogramFieldDownsampler[] tDigestHistogramDownsamplers;
private final NumericMetricFieldDownsampler[] numericDownsamplers;
private final NumericMetricFieldDownsampler.AggregateCounterFieldDownsampler[] aggregateCounterDownsamplers;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big fan of undocumented variables.. maybe add a quick note of what is this tracking?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if it's more clear, otherwise I will iterate

private long timestamp;
private int docCount;
private CounterResetDataPoints counterResetDataPoints;
private final List<AbstractFieldDownsampler<?>> fieldDownsamplers;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: it's a bit confusing now, since dimensions and counters are also fields. Let's add comments, or maybe rename the variables to document what each one covers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added comments here too.

@gmarouli
Copy link
Contributor Author

Buildkite benchmark this with tsdb please

@gmarouli
Copy link
Contributor Author

Buildkite benchmark this with tsdb please

@elasticmachine
Copy link
Collaborator

elasticmachine commented Mar 11, 2026

💚 Build Succeeded

This build ran two tsdb benchmarks to evaluate performance impact of this PR.

History

@gmarouli
Copy link
Contributor Author

TSDB Benchmark

Aggregate

# Baseline
Shard [[tsdb][0]] successfully sent [116633696], received source doc [7089492], indexed downsampled doc [7089492], failed [0], took [4m]
Shard [[tsdb][0]] successfully sent [116633696], received source doc [229256], indexed downsampled doc [229256], failed [0], took [1.9m]
Shard [[tsdb][0]] successfully sent [116633696], received source doc [116859], indexed downsampled doc [116859], failed [0], took [1.9m]

# Conteder
Shard [[tsdb][0]] successfully sent [116633696], received source doc [7090709], indexed downsampled doc [7090709], failed [0], took [4.1m]
Shard [[tsdb][0]] successfully sent [116633696], received source doc [231486], indexed downsampled doc [231486], failed [0], took [2m]
Shard [[tsdb][0]] successfully sent [116633696], received source doc [119308], indexed downsampled doc [119308], failed [0], took [1.9m]

Last value

# Baseline
Shard [[tsdb][0]] successfully sent [116633696], received source doc [7089492], indexed downsampled doc [7089492], failed [0], took [3.9m]
Shard [[tsdb][0]] successfully sent [116633696], received source doc [229256], indexed downsampled doc [229256], failed [0], took [2.1m]
Shard [[tsdb][0]] successfully sent [116633696], received source doc [116859], indexed downsampled doc [116859], failed [0], took [2m]

# Contender
Shard [[tsdb][0]] successfully sent [116633696], received source doc [7089492], indexed downsampled doc [7089492], failed [0], took [3.9m]
Shard [[tsdb][0]] successfully sent [116633696], received source doc [229256], indexed downsampled doc [229256], failed [0], took [2m]
Shard [[tsdb][0]] successfully sent [116633696], received source doc [116859], indexed downsampled doc [116859], failed [0], took [1.9m]

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gmarouli
Copy link
Contributor Author

TSDB Benchmark

Aggregate

# Baseline
Shard [[tsdb][0]] successfully sent [116633696], received source doc [7089492], indexed downsampled doc [7089492], failed [0], took [4.1m]
Shard [[tsdb][0]] successfully sent [116633696], received source doc [229256], indexed downsampled doc [229256], failed [0], took [1.9m]
Shard [[tsdb][0]] successfully sent [116633696], received source doc [116859], indexed downsampled doc [116859], failed [0], took [1.9m]

# Conteder
Shard [[tsdb][0]] successfully sent [116633696], received source doc [7090709], indexed downsampled doc [7090709], failed [0], took [3.9m]
Shard [[tsdb][0]] successfully sent [116633696], received source doc [231486], indexed downsampled doc [231486], failed [0], took [2m]
Shard [[tsdb][0]] successfully sent [116633696], received source doc [119308], indexed downsampled doc [119308], failed [0], took [1.9m]

Last value

# Baseline
Shard [[tsdb][0]] successfully sent [116633696], received source doc [7089492], indexed downsampled doc [7089492], failed [0], took [4m]
Shard [tsdb][0] processed [116633696] docs, created [229256] downsample buckets
Shard [[tsdb][0]] successfully sent [116633696], received source doc [116859], indexed downsampled doc [116859], failed [0], took [1.9m]

# Contender
Shard [[tsdb][0]] successfully sent [116633696], received source doc [7089492], indexed downsampled doc [7089492], failed [0], took [3.9m]
Shard [[tsdb][0]] successfully sent [116633696], received source doc [229256], indexed downsampled doc [229256], failed [0], took [2.1m]
Shard [[tsdb][0]] successfully sent [116633696], received source doc [116859], indexed downsampled doc [116859], failed [0], took [1.9m]

@gmarouli gmarouli merged commit 9c965a7 into elastic:main Mar 13, 2026
33 of 36 checks passed
@gmarouli gmarouli deleted the downsampling-counters-with-resets branch March 13, 2026 07:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :StorageEngine/Downsampling Downsampling (replacement for rollups) - Turn fine-grained time-based data into coarser-grained data Team:StorageEngine v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Downsampling++] Improve the way we downsample counters

5 participants