Pollute call sites before running benchmarks. #436

jpountz · 2025-08-12T06:58:05Z

Because nightly benchmarks only test a small set of scenarios, the JVM may end up over-optimizing query evaluation. For instance, it only runs with BM25Similarity, sorting tasks only run against a TermQuery, filtered vector search only exercises the approximate path, not the exact path, etc.

This tries to make the benchmark more realistic by running some cheap queries before running bencharks, whose goal is to pollute call sites so that they are not all magically monomorphic.

This will translate in a drop in performance for some tasks, but hopefully we can recover some of it in the future.

Related PR:

Brings back Scorer#applyAsRequiredClause apache/lucene#14968 where we suspected the speedup to be due to specialization making a call site monomorphic in nightly benchmarks that would not be monomorphic in the real world,
Better vectorize score computations. apache/lucene#15039 where we are trying to improve behavior with several different similarity impls but the benchmarks only show a small improvement since they always run with BM25Similarity.

Because nightly benchmarks only test a small set of scenarios, the JVM may end up over-optimizing query evaluation. For instance, it only runs with BM25Similarity, sorting tasks only run against a TermQuery, filtered vector search only exercises the approximate path, not the exact path, etc. This tries to make the benchmark more realistic by running some cheap queries before running bencharks, whose goal is to pollute call sites so that they are not all magically monomorphic. This will translate in a drop in performance for some tasks, but hopefully we can recover some of it in the future. Related PR: - apache/lucene#14968 where we suspected the speedup to be due to specialization making a call site monomorphic in nightly benchmarks that would not be monomorphic in the real world, - apache/lucene#15039 where we are trying to improve behavior with several different similarity impls but the benchmarks only show a small improvement since they always run with BM25Similarity.

jpountz · 2025-08-12T06:58:52Z

Here's the result of a run where pollution is disabled on the baseline and enabled on the competitor:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      OrHighHigh       79.20      (2.6%)       65.72      (3.8%)  -17.0% ( -22% -  -10%) 0.000
                     OrStopWords       49.64      (2.7%)       42.41      (3.6%)  -14.6% ( -20% -   -8%) 0.000
                     AndHighHigh       70.67      (2.4%)       60.57      (5.6%)  -14.3% ( -21% -   -6%) 0.000
                       OrHighMed      260.29      (2.4%)      223.68      (3.0%)  -14.1% ( -19% -   -8%) 0.000
                      AndHighMed      206.71      (2.2%)      179.94      (5.3%)  -12.9% ( -19% -   -5%) 0.000
                        Or3Terms      236.19      (2.3%)      209.14      (2.6%)  -11.5% ( -15% -   -6%) 0.000
                    AndStopWords       48.47      (2.4%)       42.98      (4.7%)  -11.3% ( -18% -   -4%) 0.000
                            Term      661.48      (4.9%)      587.73      (5.3%)  -11.2% ( -20% -   -1%) 0.000
                       And3Terms      247.78      (1.9%)      224.68      (3.4%)   -9.3% ( -14% -   -4%) 0.000
              Or2Terms2StopWords      207.03      (1.7%)      187.84      (2.1%)   -9.3% ( -12% -   -5%) 0.000
              FilteredAndHighMed      159.86      (1.3%)      146.89      (2.1%)   -8.1% ( -11% -   -4%) 0.000
                AndMedOrHighHigh       88.35      (2.5%)       81.29      (2.6%)   -8.0% ( -12% -   -2%) 0.000
             And2Terms2StopWords      208.43      (1.6%)      192.54      (2.8%)   -7.6% ( -11% -   -3%) 0.000
             FilteredAndHighHigh       81.64      (1.2%)       76.06      (1.9%)   -6.8% (  -9% -   -3%) 0.000
                      OrHighRare      299.01      (6.8%)      279.28      (7.2%)   -6.6% ( -19% -    7%) 0.045
                          OrMany       23.43      (3.1%)       21.93      (1.8%)   -6.4% ( -10% -   -1%) 0.000
            FilteredAndStopWords       67.63      (1.6%)       63.38      (1.7%)   -6.3% (  -9% -   -3%) 0.000
               CombinedOrHighMed       87.87      (1.8%)       82.37      (2.6%)   -6.3% ( -10% -   -1%) 0.000
                    CombinedTerm       39.26      (1.9%)       36.82      (2.9%)   -6.2% ( -10% -   -1%) 0.000
              CombinedOrHighHigh       23.23      (1.8%)       21.81      (2.9%)   -6.1% ( -10% -   -1%) 0.000
     FilteredAnd2Terms2StopWords      220.25      (1.3%)      208.60      (1.6%)   -5.3% (  -8% -   -2%) 0.000
               FilteredAnd3Terms      193.49      (1.5%)      183.53      (1.3%)   -5.1% (  -7% -   -2%) 0.000
                       CountTerm     9397.84      (4.0%)     9023.88      (3.2%)   -4.0% ( -10% -    3%) 0.019
                   TermTitleSort       86.62      (9.3%)       83.52      (3.8%)   -3.6% ( -15% -   10%) 0.286
                 AndHighOrMedMed       50.97      (1.3%)       49.34      (0.7%)   -3.2% (  -5% -   -1%) 0.000
                  FilteredOrMany       16.41      (2.0%)       15.94      (1.4%)   -2.9% (  -6% -    0%) 0.000
                 CountAndHighMed      308.78      (2.6%)      300.40      (1.8%)   -2.7% (  -6% -    1%) 0.011
              CombinedAndHighMed       89.12      (1.7%)       86.71      (0.8%)   -2.7% (  -5% -    0%) 0.000
              FilteredOrHighHigh       67.49      (1.7%)       65.72      (1.7%)   -2.6% (  -5% -    0%) 0.001
                  CountOrHighMed      360.58      (3.4%)      351.94      (1.6%)   -2.4% (  -7% -    2%) 0.055
             FilteredOrStopWords       45.82      (2.0%)       44.75      (1.8%)   -2.3% (  -6% -    1%) 0.011
      FilteredOr2Terms2StopWords      146.98      (0.9%)      143.83      (1.1%)   -2.1% (  -4% -    0%) 0.000
                FilteredOr3Terms      167.05      (1.3%)      163.59      (0.8%)   -2.1% (  -4% -    0%) 0.000
             CombinedAndHighHigh       23.33      (2.0%)       22.85      (0.9%)   -2.0% (  -4% -    0%) 0.005
               FilteredOrHighMed      153.17      (1.1%)      150.24      (0.9%)   -1.9% (  -3% -    0%) 0.000
                  FilteredPhrase       32.03      (2.0%)       31.45      (1.1%)   -1.8% (  -4% -    1%) 0.016
                 CountOrHighHigh      344.82      (2.1%)      339.65      (2.5%)   -1.5% (  -5% -    3%) 0.164
                 FilteredPrefix3      150.97      (1.0%)      148.70      (3.1%)   -1.5% (  -5% -    2%) 0.170
                     CountOrMany       29.38      (2.0%)       28.95      (1.8%)   -1.5% (  -5% -    2%) 0.097
                     CountPhrase        4.23      (1.8%)        4.17      (3.0%)   -1.4% (  -6% -    3%) 0.240
                    FilteredTerm      161.78      (2.2%)      159.63      (1.8%)   -1.3% (  -5% -    2%) 0.154
             CountFilteredOrMany       27.28      (1.9%)       27.03      (2.0%)   -0.9% (  -4% -    3%) 0.313
         CountFilteredOrHighHigh      137.45      (1.0%)      136.35      (1.2%)   -0.8% (  -2% -    1%) 0.124
          CountFilteredOrHighMed      149.19      (0.8%)      148.05      (1.1%)   -0.8% (  -2% -    1%) 0.091
                      TermDTSort      385.74      (4.7%)      383.98      (2.4%)   -0.5% (  -7% -    6%) 0.795
                CountAndHighHigh      359.36      (2.3%)      358.28      (2.3%)   -0.3% (  -4% -    4%) 0.781
             CountFilteredPhrase       25.14      (2.2%)       25.07      (2.6%)   -0.3% (  -4% -    4%) 0.798
                   TermMonthSort     3332.40      (2.5%)     3328.19      (2.3%)   -0.1% (  -4% -    4%) 0.910
                  FilteredIntNRQ      299.10      (1.3%)      299.17      (1.3%)    0.0% (  -2% -    2%) 0.966
               TermDayOfYearSort      279.58      (4.9%)      280.34      (1.2%)    0.3% (  -5% -    6%) 0.872

jpountz · 2025-08-12T06:59:46Z

@ChrisHegarty I think I remember seeing something like that in one of your recent PRs but I can't find it anymore?

ChrisHegarty · 2025-08-12T07:36:24Z

@ChrisHegarty I think I remember seeing something like that in one of your recent PRs but I can't find it anymore?

Yeah, I had something similar in the benchmark update of this PR apache/lucene#15037. I still need to make it optional, so it can be enabled or not for comparison.

Generally, I do think that this is a good idea, as it will allow us to find such potential problems so that we can fix 'em and make performance more consistent.

ChrisHegarty

LGTM

mikemccand

Thanks @jpountz -- this is a great idea to make benchy more real-world realistic.

mikemccand · 2025-08-12T11:09:43Z

src/main/perf/TypePolluter.java

@@ -0,0 +1,174 @@
+package perf;


Needs ASL copyright header.

Thank you for noticing, I added one.

mikemccand · 2025-08-12T11:10:26Z

src/main/perf/SearchPerfTest.java

    final TestContext testContext = TestContext.parse(args.getString("-context", ""));

+    if (pollute) {
+      TypePolluter.pollute();


Curious that the one-time pollution is enough! Hotspot doesn't noticed that things later got singular and then re-optimize?

Good question. I'm not intimate enough with Hotspot to give you an answer. I suspect that it technically could, but that it wouldn't help that much in real-world applications, so it doesn't bother. @ChrisHegarty may have more data?

I think that what's in the PR is fine. It is possible that things change over time and that Hostpot could potentially optimise differently in the future when profiles change, but like Adrien, I'm less worried about this in real world scenarios.

I ran experiments locally that suggest that some of the performance decrease from type pollution (mikemccand/luceneutil#436) can be attributed to calls to `SimScorer#score` no longer being inlinable since they are polymorphic. This change helps `BM25Scorer` remain inlinable using similar tricks that we are applying for `Bits#get` and `ImpactsEnum#nextDoc`/`ImpactsEnum#advance`. Hopefully changes such as apache#15039 will help improve performance with other similarities as well in the future.

jpountz · 2025-08-23T19:12:30Z

I pushed an annotation for this change.

ChrisHegarty approved these changes Aug 12, 2025

View reviewed changes

mikemccand approved these changes Aug 12, 2025

View reviewed changes

HUSTERGS mentioned this pull request Aug 12, 2025

Brings back Scorer#applyAsRequiredClause apache/lucene#14968

Draft

jpountz added 3 commits August 13, 2025 16:29

improve

d8c23f0

Add license and tidy a bit.

fe48ae6

Undo unintended changes.

3b79d8d

jpountz merged commit b2228fe into mikemccand:main Aug 13, 2025
1 check passed

jpountz deleted the pollute branch August 13, 2025 14:50

HUSTERGS mentioned this pull request Aug 15, 2025

Wraps all iterator with likelyImpactsEnum under BlockMaxConjunctionBulkScorer apache/lucene#15004

Merged

jpountz mentioned this pull request Aug 17, 2025

Make calls to BM25Scorer#score inlinable. apache/lucene#15082

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pollute call sites before running benchmarks. #436

Pollute call sites before running benchmarks. #436

Uh oh!

jpountz commented Aug 12, 2025

Uh oh!

jpountz commented Aug 12, 2025

Uh oh!

jpountz commented Aug 12, 2025

Uh oh!

ChrisHegarty commented Aug 12, 2025

Uh oh!

ChrisHegarty left a comment

Uh oh!

mikemccand left a comment

Uh oh!

mikemccand Aug 12, 2025

Uh oh!

jpountz Aug 13, 2025

Uh oh!

mikemccand Aug 12, 2025

Uh oh!

jpountz Aug 13, 2025

Uh oh!

ChrisHegarty Aug 13, 2025

Uh oh!

Uh oh!

jpountz commented Aug 23, 2025

Uh oh!

Uh oh!

Pollute call sites before running benchmarks. #436

Pollute call sites before running benchmarks. #436

Uh oh!

Conversation

jpountz commented Aug 12, 2025

Uh oh!

jpountz commented Aug 12, 2025

Uh oh!

jpountz commented Aug 12, 2025

Uh oh!

ChrisHegarty commented Aug 12, 2025

Uh oh!

ChrisHegarty left a comment

Choose a reason for hiding this comment

Uh oh!

mikemccand left a comment

Choose a reason for hiding this comment

Uh oh!

mikemccand Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

jpountz Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

mikemccand Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

jpountz Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

ChrisHegarty Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jpountz commented Aug 23, 2025

Uh oh!

Uh oh!