Skip to content

Conversation

HUSTERGS
Copy link
Contributor

Description

Inspired by #14690, this PR essentially tries to bring back the Scorer#applyAsRequiredClause interface, but different from #14690 , I'm wondering whether we can pass the DocAndScoreAccBuffer all the way down to the posting, so maybe we can benefit from reducing the advance function calls when the buffer have a dense doc id set, eg, utilize the SIMID again. So I added a new interface on PostingsEnum#nextRequiredFreqBuffer (not stable yet), currently I only implement the default implemetation, still trying to speedup the process under BlockPostingEnum.

This is still under development, I know we should be cautious about adding new public interface (especially two at once!), but I want to share current progress, below are the luceneutil benchmark result on wikimediumall with searchConcurrency=0, taskCountPerCat=5, taskRepeatCount=50, here is the result after 20 iterations (against the latest code):

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                            Term      472.96      (6.1%)      461.21      (6.8%)   -2.5% ( -14% -   11%) 0.224
                          Term1M      472.39      (5.9%)      460.90      (7.0%)   -2.4% ( -14% -   11%) 0.232
                         Term100      472.69      (5.9%)      461.52      (6.8%)   -2.4% ( -14% -   10%) 0.239
                       TermB1M1P      472.28      (5.9%)      461.42      (6.9%)   -2.3% ( -14% -   11%) 0.257
                         Term10K      472.60      (6.0%)      461.82      (6.8%)   -2.3% ( -14% -   11%) 0.260
                         TermB1M      472.22      (6.0%)      461.48      (6.9%)   -2.3% ( -14% -   11%) 0.266
                DismaxOrHighHigh       35.68      (3.5%)       35.03      (3.9%)   -1.8% (  -8% -    5%) 0.118
                          OrMany        4.73      (5.3%)        4.66      (5.7%)   -1.4% ( -11% -   10%) 0.429
                IntervalsOrdered        2.47      (1.9%)        2.44      (3.1%)   -1.3% (  -6% -    3%) 0.120
                      OrHighRare       96.38      (6.9%)       95.20      (6.5%)   -1.2% ( -13% -   13%) 0.564
                 DismaxOrHighMed       50.96      (4.1%)       50.35      (4.2%)   -1.2% (  -9% -    7%) 0.358
                    CombinedTerm       11.17      (3.7%)       11.04      (4.8%)   -1.1% (  -9% -    7%) 0.402
                      DismaxTerm      516.28      (5.0%)      510.53      (4.1%)   -1.1% (  -9% -    8%) 0.440
                      OrHighHigh       21.39      (2.5%)       21.22      (2.8%)   -0.8% (  -5% -    4%) 0.319
                     AndHighHigh       22.46      (2.4%)       22.27      (2.5%)   -0.8% (  -5% -    4%) 0.292
                       OrHighMed       68.77      (4.0%)       68.47      (4.3%)   -0.4% (  -8% -    8%) 0.748
             CountFilteredPhrase        9.21      (2.8%)        9.17      (3.4%)   -0.4% (  -6% -    6%) 0.675
                        SpanNear        2.51      (4.1%)        2.51      (4.5%)   -0.3% (  -8% -    8%) 0.817
              Or2Terms2StopWords       62.44      (6.0%)       62.25      (6.2%)   -0.3% ( -11% -   12%) 0.876
                    SloppyPhrase        1.14      (3.2%)        1.13      (4.1%)   -0.3% (  -7% -    7%) 0.796
             FilteredOrStopWords        8.16      (2.3%)        8.14      (2.8%)   -0.3% (  -5% -    4%) 0.729
                     CountPhrase        2.71      (2.0%)        2.71      (2.3%)   -0.2% (  -4% -    4%) 0.731
             CountFilteredIntNRQ       16.35      (1.4%)       16.34      (1.3%)   -0.1% (  -2% -    2%) 0.862
                AndMedOrHighHigh       16.75      (2.1%)       16.74      (3.3%)   -0.1% (  -5% -    5%) 0.935
                  FilteredIntNRQ       42.25      (3.1%)       42.23      (3.1%)   -0.0% (  -6% -    6%) 0.967
                 AndHighOrMedMed       14.11      (3.2%)       14.11      (3.1%)   -0.0% (  -6% -    6%) 1.000
              FilteredOrHighHigh       13.03      (2.6%)       13.03      (2.9%)    0.0% (  -5% -    5%) 0.996
                          IntNRQ       42.59      (3.2%)       42.60      (3.2%)    0.0% (  -6% -    6%) 0.979
          CountFilteredOrHighMed       17.95      (0.8%)       17.97      (0.9%)    0.1% (  -1% -    1%) 0.806
         CountFilteredOrHighHigh       15.86      (0.9%)       15.88      (1.0%)    0.1% (  -1% -    2%) 0.668
                         Respell       37.00      (4.4%)       37.07      (3.9%)    0.2% (  -7% -    8%) 0.897
                      AndHighMed       54.33      (3.3%)       54.43      (3.6%)    0.2% (  -6% -    7%) 0.861
                          Phrase        7.62      (2.6%)        7.64      (3.3%)    0.2% (  -5% -    6%) 0.834
                          Fuzzy2       36.78      (4.3%)       36.89      (4.8%)    0.3% (  -8% -    9%) 0.836
                  FilteredPhrase        9.86      (2.6%)        9.89      (3.1%)    0.3% (  -5% -    6%) 0.737
                          Fuzzy1       40.61      (4.3%)       40.74      (4.9%)    0.3% (  -8% -    9%) 0.822
               FilteredOrHighMed       39.08      (3.7%)       39.21      (3.8%)    0.3% (  -6% -    8%) 0.780
                    FilteredTerm       65.09      (3.6%)       65.34      (3.9%)    0.4% (  -6% -    8%) 0.756
                 CountAndHighMed       75.20      (2.5%)       75.50      (3.1%)    0.4% (  -5% -    6%) 0.649
                FilteredOr3Terms       44.00      (3.7%)       44.21      (3.9%)    0.5% (  -6% -    8%) 0.683
             CombinedAndHighHigh        5.73      (2.1%)        5.76      (2.0%)    0.5% (  -3% -    4%) 0.430
             CountFilteredOrMany        4.46      (2.6%)        4.48      (2.9%)    0.6% (  -4% -    6%) 0.511
                   TermTitleSort       51.76      (4.4%)       52.07      (5.6%)    0.6% (  -9% -   11%) 0.707
               FilteredAnd3Terms      101.78      (3.5%)      102.42      (3.3%)    0.6% (  -5% -    7%) 0.552
                CountAndHighHigh       48.46      (2.3%)       48.77      (2.5%)    0.6% (  -4% -    5%) 0.406
                  CountOrHighMed       77.72      (2.3%)       78.26      (2.5%)    0.7% (  -4% -    5%) 0.360
             And2Terms2StopWords       60.68      (6.5%)       61.13      (6.6%)    0.7% ( -11% -   14%) 0.720
              CombinedOrHighHigh        5.62      (3.5%)        5.66      (4.1%)    0.8% (  -6% -    8%) 0.527
                       And3Terms       72.70      (4.0%)       73.25      (4.2%)    0.8% (  -7% -    9%) 0.559
      FilteredOr2Terms2StopWords       50.50      (4.8%)       50.89      (5.0%)    0.8% (  -8% -   11%) 0.624
              CombinedAndHighMed       21.62      (4.9%)       21.78      (4.5%)    0.8% (  -8% -   10%) 0.608
                 CountOrHighHigh       49.83      (2.6%)       50.25      (2.6%)    0.8% (  -4% -    6%) 0.307
                  FilteredOrMany        4.03      (2.7%)        4.06      (2.8%)    0.9% (  -4% -    6%) 0.324
               CombinedOrHighMed       21.21      (4.8%)       21.41      (5.1%)    0.9% (  -8% -   11%) 0.552
                     CountOrMany        5.02      (3.1%)        5.07      (2.9%)    0.9% (  -4% -    7%) 0.328
                      TermDTSort      145.50      (5.2%)      147.04      (4.9%)    1.1% (  -8% -   11%) 0.509
                        Or3Terms       65.31      (3.8%)       66.02      (4.4%)    1.1% (  -6% -    9%) 0.404
               TermDayOfYearSort      266.30      (3.6%)      269.35      (3.8%)    1.1% (  -6% -    8%) 0.331
                        Wildcard       47.14      (3.5%)       47.75      (4.4%)    1.3% (  -6% -    9%) 0.298
                          IntSet      298.82      (4.0%)      303.31      (5.5%)    1.5% (  -7% -   11%) 0.325
                 FilteredPrefix3       69.72      (3.6%)       70.79      (3.3%)    1.5% (  -5% -    8%) 0.162
                   TermMonthSort     2187.76      (4.5%)     2231.91      (4.8%)    2.0% (  -7% -   11%) 0.173
     FilteredAnd2Terms2StopWords       60.66      (4.9%)       61.89      (5.0%)    2.0% (  -7% -   12%) 0.192
                         Prefix3       74.19      (4.0%)       75.82      (3.6%)    2.2% (  -5% -   10%) 0.069
                       CountTerm     6268.63      (6.8%)     6406.68      (7.1%)    2.2% ( -11% -   17%) 0.319
                     OrStopWords        8.96      (2.2%)        9.21      (3.7%)    2.9% (  -3% -    8%) 0.003
                    AndStopWords        8.78      (2.9%)        9.14      (2.8%)    4.1% (  -1% -   10%) 0.000
              FilteredAndHighMed       31.61      (2.4%)       32.97      (2.2%)    4.3% (   0% -    9%) 0.000
            FilteredAndStopWords        8.34      (2.3%)        8.86      (2.1%)    6.3% (   1% -   10%) 0.000
             FilteredAndHighHigh       10.32      (2.5%)       11.00      (1.9%)    6.7% (   2% -   11%) 0.000

I think it's promissing to look into this approach more. If I understand correctly , this speedup should only come from the reduces virtual function calls ?

@jpountz
Copy link
Contributor

jpountz commented Jul 19, 2025

I suspect that the fact that you first compute the intersection and then compute scores also helps as it's more cache-friendly (less data to have in the various caches at once) and potentially enables vectorizing the computation of scores.

@jpountz
Copy link
Contributor

jpountz commented Jul 19, 2025

I'd suggest to focus this first PR on the Scorer#applyAsRequiredClause API and later see if there's more room for speedups by adding new APIs to PostingsEnum in a follow-up PR?

@HUSTERGS
Copy link
Contributor Author

I'd suggest to focus this first PR on the Scorer#applyAsRequiredClause API and later see if there's more room for speedups by adding new APIs to PostingsEnum in a follow-up PR?

Yeah, I think it's a good idea, I did some experiment with some detail of current version of code these days.
I've move the PostingEnum related code directly into the applyAsRequiredClause and removed the dependency for newly intruduced NormAndFreqBuffer, the luceneutil benchmark result seems no longer yield a good performance gain (at least not as good as before):, especially for the OrStopWords query, Here is the result:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                       OrHighMed       69.09      (3.6%)       67.53     (10.6%)   -2.3% ( -15% -   12%) 0.365
              Or2Terms2StopWords       63.02      (5.4%)       61.66      (8.7%)   -2.2% ( -15% -   12%) 0.344
                    CombinedTerm       11.07      (5.1%)       10.85      (4.6%)   -2.1% ( -11% -    8%) 0.180
                     AndHighHigh       22.49      (2.8%)       22.03      (9.5%)   -2.1% ( -13% -   10%) 0.354
                      OrHighHigh       21.41      (2.3%)       20.98      (9.9%)   -2.0% ( -13% -   10%) 0.384
                       And3Terms       73.13      (3.5%)       71.87      (8.7%)   -1.7% ( -13% -   10%) 0.409
                         TermB1M      473.09      (4.4%)      465.04      (8.0%)   -1.7% ( -13% -   11%) 0.404
                       TermB1M1P      473.02      (4.4%)      465.15      (8.1%)   -1.7% ( -13% -   11%) 0.419
                      AndHighMed       54.53      (3.2%)       53.64     (10.0%)   -1.6% ( -14% -   11%) 0.485
                        Or3Terms       65.73      (3.6%)       64.70      (9.4%)   -1.6% ( -13% -   11%) 0.484
                         Term10K      472.65      (4.4%)      465.27      (8.1%)   -1.6% ( -13% -   11%) 0.450
                         Term100      472.96      (4.2%)      465.67      (8.1%)   -1.5% ( -13% -   11%) 0.452
                          Term1M      472.95      (4.4%)      465.91      (8.1%)   -1.5% ( -13% -   11%) 0.469
                            Term      472.92      (4.5%)      465.98      (8.2%)   -1.5% ( -13% -   11%) 0.481
             And2Terms2StopWords       61.13      (5.8%)       60.43      (8.9%)   -1.2% ( -14% -   14%) 0.627
                   TermMonthSort     2219.47      (3.2%)     2201.87      (3.7%)   -0.8% (  -7% -    6%) 0.467
                 DismaxOrHighMed       50.75      (3.5%)       50.42      (7.9%)   -0.7% ( -11% -   11%) 0.733
                AndMedOrHighHigh       16.90      (2.9%)       16.80      (4.8%)   -0.6% (  -8% -    7%) 0.621
                          IntSet      298.20      (4.4%)      296.49      (4.0%)   -0.6% (  -8% -    8%) 0.666
                      DismaxTerm      513.00      (4.7%)      510.11      (6.5%)   -0.6% ( -11% -   11%) 0.753
                DismaxOrHighHigh       35.47      (3.2%)       35.31      (6.3%)   -0.4% (  -9% -    9%) 0.776
                FilteredOr3Terms       44.20      (3.2%)       44.11      (3.1%)   -0.2% (  -6% -    6%) 0.831
      FilteredOr2Terms2StopWords       50.82      (4.1%)       50.72      (4.3%)   -0.2% (  -8% -    8%) 0.886
                          Fuzzy1       40.73      (3.7%)       40.65      (4.9%)   -0.2% (  -8% -    8%) 0.891
                          OrMany        4.69      (3.7%)        4.69      (6.1%)   -0.1% (  -9% -   10%) 0.950
              CombinedAndHighMed       21.52      (4.6%)       21.51      (4.3%)   -0.1% (  -8% -    9%) 0.967
                  CountOrHighMed       78.12      (1.7%)       78.08      (2.6%)   -0.0% (  -4% -    4%) 0.944
                  FilteredOrMany        4.06      (2.4%)        4.06      (2.2%)   -0.0% (  -4% -    4%) 0.964
     FilteredAnd2Terms2StopWords       61.06      (4.3%)       61.04      (6.2%)   -0.0% ( -10% -   10%) 0.986
               FilteredOrHighMed       39.18      (3.3%)       39.17      (3.2%)   -0.0% (  -6% -    6%) 0.986
                       CountTerm     6298.40      (4.3%)     6297.77      (4.5%)   -0.0% (  -8% -    9%) 0.994
                          Fuzzy2       36.89      (3.5%)       36.90      (4.7%)    0.0% (  -7% -    8%) 0.980
                 CountAndHighMed       75.47      (1.7%)       75.50      (2.1%)    0.0% (  -3% -    3%) 0.941
                          IntNRQ       42.55      (2.2%)       42.57      (3.0%)    0.1% (  -4% -    5%) 0.946
               FilteredAnd3Terms      101.94      (2.3%)      102.05      (2.9%)    0.1% (  -5% -    5%) 0.900
          CountFilteredOrHighMed       17.95      (0.7%)       17.98      (0.6%)    0.2% (  -1% -    1%) 0.460
              FilteredOrHighHigh       13.02      (2.5%)       13.05      (2.3%)    0.2% (  -4% -    5%) 0.813
                  FilteredIntNRQ       42.16      (2.3%)       42.24      (3.0%)    0.2% (  -4% -    5%) 0.820
             CountFilteredIntNRQ       16.31      (0.8%)       16.35      (1.2%)    0.2% (  -1% -    2%) 0.468
         CountFilteredOrHighHigh       15.86      (0.8%)       15.90      (0.8%)    0.3% (  -1% -    1%) 0.331
                 CountOrHighHigh       50.16      (2.4%)       50.29      (2.5%)    0.3% (  -4% -    5%) 0.724
             CountFilteredPhrase        9.18      (2.5%)        9.21      (3.4%)    0.3% (  -5% -    6%) 0.771
                        Wildcard       47.34      (3.3%)       47.48      (3.7%)    0.3% (  -6% -    7%) 0.790
                 AndHighOrMedMed       14.04      (2.2%)       14.08      (2.6%)    0.3% (  -4% -    5%) 0.688
                IntervalsOrdered        2.43      (3.4%)        2.44      (3.3%)    0.3% (  -6% -    7%) 0.760
                     CountOrMany        5.04      (2.9%)        5.06      (2.8%)    0.4% (  -5% -    6%) 0.696
             CountFilteredOrMany        4.46      (2.5%)        4.48      (2.6%)    0.4% (  -4% -    5%) 0.635
                   TermTitleSort       51.93      (4.8%)       52.13      (5.1%)    0.4% (  -9% -   10%) 0.809
                CountAndHighHigh       48.66      (2.2%)       48.85      (2.2%)    0.4% (  -3% -    4%) 0.560
             CombinedAndHighHigh        5.67      (2.8%)        5.69      (2.3%)    0.4% (  -4% -    5%) 0.597
                         Prefix3       75.57      (3.8%)       75.90      (3.3%)    0.4% (  -6% -    7%) 0.699
                 FilteredPrefix3       70.64      (3.3%)       70.98      (3.1%)    0.5% (  -5% -    7%) 0.637
                         Respell       36.72      (3.5%)       36.93      (3.6%)    0.6% (  -6% -    7%) 0.603
                        SpanNear        2.45      (5.5%)        2.46      (5.4%)    0.6% (  -9% -   12%) 0.733
             FilteredOrStopWords        8.13      (2.2%)        8.18      (2.0%)    0.7% (  -3% -    5%) 0.329
                    FilteredTerm       64.92      (3.0%)       65.36      (3.6%)    0.7% (  -5% -    7%) 0.522
                      TermDTSort      144.97      (3.3%)      146.07      (4.8%)    0.8% (  -7% -    9%) 0.561
                  FilteredPhrase        9.83      (2.2%)        9.91      (2.6%)    0.8% (  -3% -    5%) 0.297
                    SloppyPhrase        1.12      (5.3%)        1.13      (4.9%)    0.8% (  -8% -   11%) 0.616
                          Phrase        7.57      (4.3%)        7.64      (4.3%)    0.9% (  -7% -    9%) 0.490
               TermDayOfYearSort      264.98      (2.6%)      267.70      (2.9%)    1.0% (  -4% -    6%) 0.241
                      OrHighRare       94.68      (6.8%)       95.91      (5.4%)    1.3% ( -10% -   14%) 0.501
               CombinedOrHighMed       21.05      (5.6%)       21.35      (4.6%)    1.4% (  -8% -   12%) 0.396
                    AndStopWords        8.87      (3.6%)        9.01      (7.4%)    1.6% (  -9% -   12%) 0.399
                     CountPhrase        2.65      (4.9%)        2.69      (3.2%)    1.9% (  -5% -   10%) 0.154
              CombinedOrHighHigh        5.54      (5.1%)        5.66      (3.0%)    2.2% (  -5% -   10%) 0.092
                     OrStopWords        8.99      (3.2%)        9.20      (8.8%)    2.3% (  -9% -   14%) 0.263
              FilteredAndHighMed       31.76      (2.4%)       32.52      (4.0%)    2.4% (  -3% -    8%) 0.020
            FilteredAndStopWords        8.41      (3.1%)        8.75      (2.0%)    4.0% (  -1% -    9%) 0.000
             FilteredAndHighHigh       10.41      (3.1%)       10.87      (1.8%)    4.4% (   0% -    9%) 0.000

If I still use the NormAndFreqBuffer (instead of freqs and normValues raw arrays inside TermScorer), the performance seems to be better? A little bit strange to me, Here is the result under identical setup (only related querys are showed below)

               CombinedOrHighMed       21.60      (4.0%)       21.98      (3.8%)    1.8% (  -5% -    9%) 0.151
                     OrStopWords        9.05      (1.4%)        9.23      (3.1%)    2.0% (  -2% -    6%) 0.009
              CombinedOrHighHigh        5.68      (2.7%)        5.81      (2.2%)    2.2% (  -2% -    7%) 0.005
              FilteredAndHighMed       31.77      (2.2%)       32.77      (1.5%)    3.1% (   0% -    7%) 0.000
            FilteredAndStopWords        8.40      (2.4%)        8.72      (1.9%)    3.8% (   0% -    8%) 0.000
             FilteredAndHighHigh       10.37      (2.4%)       10.84      (1.3%)    4.5% (   0% -    8%) 0.000

Not sure what causes the differences : (
Will push a new commit using raw array though

@HUSTERGS
Copy link
Contributor Author

Also, I'm wondering whether we can change the protocal of nextPostings , so that if the buffer have a non-zero size, means we are doing filtering, otherwise we are just polling the posting to the buffer, and we don't need another new api? just think out loud

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran it on my machine and got a tiny speedup, possibly because vectorization of score computations gives more gains on your machine?

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                          OrMany       23.60      (1.8%)       23.45      (3.0%)   -0.7% (  -5% -    4%) 0.408
                       CountTerm     9274.34      (1.7%)     9215.96      (2.9%)   -0.6% (  -5% -    4%) 0.401
                   TermMonthSort     3352.95      (2.1%)     3333.71      (1.6%)   -0.6% (  -4% -    3%) 0.328
                     CountOrMany       28.93      (1.7%)       28.82      (3.5%)   -0.4% (  -5% -    4%) 0.658
               FilteredOrHighMed      153.09      (1.2%)      152.62      (1.1%)   -0.3% (  -2% -    1%) 0.390
                 CountOrHighHigh      339.09      (2.5%)      338.47      (4.2%)   -0.2% (  -6% -    6%) 0.867
                FilteredOr3Terms      166.53      (1.2%)      166.40      (1.2%)   -0.1% (  -2% -    2%) 0.832
              FilteredOrHighHigh       67.14      (2.2%)       67.12      (1.4%)   -0.0% (  -3% -    3%) 0.946
                     CountPhrase        4.23      (2.2%)        4.22      (2.2%)   -0.0% (  -4% -    4%) 0.975
                      TermDTSort      387.92      (2.7%)      388.12      (2.6%)    0.1% (  -5% -    5%) 0.951
      FilteredOr2Terms2StopWords      146.92      (1.1%)      146.99      (1.1%)    0.1% (  -2% -    2%) 0.880
                  FilteredPhrase       32.18      (1.5%)       32.20      (1.5%)    0.1% (  -2% -    3%) 0.897
                  FilteredOrMany       16.55      (1.0%)       16.57      (2.1%)    0.1% (  -2% -    3%) 0.826
          CountFilteredOrHighMed      148.61      (0.7%)      148.81      (0.8%)    0.1% (  -1% -    1%) 0.593
             CountFilteredPhrase       25.45      (1.7%)       25.49      (1.5%)    0.2% (  -2% -    3%) 0.756
             FilteredOrStopWords       45.64      (2.3%)       45.73      (1.4%)    0.2% (  -3% -    4%) 0.749
             CountFilteredOrMany       27.07      (1.6%)       27.12      (1.2%)    0.2% (  -2% -    2%) 0.644
         CountFilteredOrHighHigh      136.63      (0.9%)      136.91      (1.1%)    0.2% (  -1% -    2%) 0.508
                  FilteredIntNRQ      294.74      (0.7%)      295.84      (0.8%)    0.4% (  -1% -    1%) 0.130
                 AndHighOrMedMed       51.29      (1.7%)       51.49      (1.2%)    0.4% (  -2% -    3%) 0.405
                    CombinedTerm       39.36      (1.2%)       39.52      (0.9%)    0.4% (  -1% -    2%) 0.255
                  CountOrHighMed      357.07      (2.2%)      358.88      (2.9%)    0.5% (  -4% -    5%) 0.535
              Or2Terms2StopWords      205.70      (1.3%)      206.84      (1.5%)    0.6% (  -2% -    3%) 0.211
                   TermTitleSort       85.42      (3.8%)       85.89      (3.9%)    0.6% (  -6% -    8%) 0.651
                 FilteredPrefix3      149.80      (2.0%)      150.67      (2.4%)    0.6% (  -3% -    5%) 0.406
                CountAndHighHigh      356.56      (2.2%)      358.88      (2.2%)    0.7% (  -3% -    5%) 0.347
             CombinedAndHighHigh       23.45      (1.0%)       23.61      (1.0%)    0.7% (  -1% -    2%) 0.027
                 CountAndHighMed      306.71      (1.7%)      308.91      (2.0%)    0.7% (  -2% -    4%) 0.219
                    FilteredTerm      161.63      (2.5%)      162.80      (1.8%)    0.7% (  -3% -    5%) 0.295
             And2Terms2StopWords      204.26      (1.6%)      205.81      (1.7%)    0.8% (  -2% -    4%) 0.147
              CombinedAndHighMed       89.38      (1.0%)       90.09      (0.8%)    0.8% (  -1% -    2%) 0.006
                       OrHighMed      257.60      (2.2%)      259.70      (1.8%)    0.8% (  -3% -    4%) 0.203
                      OrHighHigh       78.29      (2.6%)       78.96      (2.5%)    0.9% (  -4% -    6%) 0.292
                      AndHighMed      202.14      (1.8%)      204.00      (1.9%)    0.9% (  -2% -    4%) 0.116
               TermDayOfYearSort      285.19      (1.2%)      287.96      (1.4%)    1.0% (  -1% -    3%) 0.020
               CombinedOrHighMed       87.78      (2.8%)       88.76      (0.9%)    1.1% (  -2% -    4%) 0.089
                        Or3Terms      231.64      (1.4%)      234.56      (1.9%)    1.3% (  -2% -    4%) 0.018
                       And3Terms      240.09      (1.5%)      243.34      (1.9%)    1.4% (  -2% -    4%) 0.013
                AndMedOrHighHigh       88.00      (2.1%)       89.20      (2.3%)    1.4% (  -2% -    5%) 0.050
     FilteredAnd2Terms2StopWords      215.32      (1.0%)      218.60      (1.2%)    1.5% (   0% -    3%) 0.000
            FilteredAndStopWords       65.47      (1.6%)       66.51      (1.4%)    1.6% (  -1% -    4%) 0.001
              FilteredAndHighMed      156.30      (1.2%)      158.91      (1.2%)    1.7% (   0% -    4%) 0.000
              CombinedOrHighHigh       23.10      (3.7%)       23.49      (1.2%)    1.7% (  -3% -    6%) 0.052
                     OrStopWords       49.00      (2.4%)       49.83      (2.2%)    1.7% (  -2% -    6%) 0.019
               FilteredAnd3Terms      189.73      (0.8%)      193.06      (1.1%)    1.8% (   0% -    3%) 0.000
                     AndHighHigh       69.14      (2.4%)       70.36      (2.3%)    1.8% (  -2% -    6%) 0.017
             FilteredAndHighHigh       79.13      (1.7%)       80.57      (1.3%)    1.8% (  -1% -    4%) 0.000
                            Term      655.28      (6.0%)      668.10      (4.7%)    2.0% (  -8% -   13%) 0.249
                    AndStopWords       47.11      (2.1%)       48.58      (2.4%)    3.1% (  -1% -    7%) 0.000
                      OrHighRare      292.97     (10.8%)      303.18      (6.0%)    3.5% ( -12% -   22%) 0.206

* this {@link Scorer} to the scores.
*/
public void applyAsRequiredClause(DocAndScoreAccBuffer buffer) throws IOException {
DocIdSetIterator iterator = iterator();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should throw an exception if this scorer exposes a twoPhaseIterator() (and add javadocs about it), since running the conjunction this way would be less efficient than doing it in a "doc first" fashion.

@jpountz
Copy link
Contributor

jpountz commented Jul 21, 2025

I'm wondering whether we can change the protocal of nextPostings , so that if the buffer have a non-zero size, means we are doing filtering

This doesn't sound clean to me. We should be careful with adding new APIs, but I'd rather have two simple APIs than a complex one.

@HUSTERGS
Copy link
Contributor Author

I ran it on my machine and got a tiny speedup, possibly because vectorization of score computations gives more gains on your machine?

Did you ran agaist the latest version or the older version which contains a new api on PostingEnum ? I'm little bit curious about the performance of previous version (if it is not the one you ran on)

This doesn't sound clean to me. We should be careful with adding new APIs, but I'd rather have two simple APIs than a complex one.

Make sense to me too, I'm also heisitate about that, thanks for your reply!

@jpountz
Copy link
Contributor

jpountz commented Jul 23, 2025

Did you ran agaist the latest version or the older version which contains a new api on PostingEnum ?

I ran against the latest version. I'll try to run the older version soon.

@jpountz
Copy link
Contributor

jpountz commented Jul 24, 2025

Here's a run against the previous version:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      OrHighRare      301.51      (3.3%)      291.53      (7.1%)   -3.3% ( -13% -    7%) 0.060
                            Term      671.84      (2.7%)      650.05      (8.0%)   -3.2% ( -13% -    7%) 0.086
                      AndHighMed      204.51      (1.6%)      197.92      (9.2%)   -3.2% ( -13% -    7%) 0.123
                     AndHighHigh       70.01      (1.8%)       68.09     (10.2%)   -2.7% ( -14% -    9%) 0.235
                       OrHighMed      259.63      (1.6%)      252.61      (7.9%)   -2.7% ( -12% -    6%) 0.134
                      OrHighHigh       78.51      (2.2%)       76.42      (9.8%)   -2.7% ( -14% -    9%) 0.237
             And2Terms2StopWords      206.68      (1.1%)      202.07      (5.4%)   -2.2% (  -8% -    4%) 0.069
              Or2Terms2StopWords      207.17      (1.2%)      202.97      (5.6%)   -2.0% (  -8% -    4%) 0.114
                       And3Terms      242.44      (1.5%)      237.63      (6.6%)   -2.0% (  -9% -    6%) 0.190
                     OrStopWords       49.14      (1.9%)       48.20      (8.9%)   -1.9% ( -12% -    9%) 0.347
                        Or3Terms      232.67      (1.5%)      229.61      (6.8%)   -1.3% (  -9% -    7%) 0.400
                          OrMany       23.67      (3.4%)       23.37      (4.0%)   -1.3% (  -8% -    6%) 0.287
             CountFilteredPhrase       25.46      (2.5%)       25.24      (2.9%)   -0.9% (  -6% -    4%) 0.308
                    AndStopWords       47.57      (1.9%)       47.19      (8.5%)   -0.8% ( -11% -    9%) 0.683
                    FilteredTerm      163.90      (1.9%)      162.67      (2.5%)   -0.7% (  -5% -    3%) 0.283
               TermDayOfYearSort      283.79      (2.5%)      282.55      (4.0%)   -0.4% (  -6% -    6%) 0.678
                  CountOrHighMed      357.89      (1.5%)      356.52      (1.6%)   -0.4% (  -3% -    2%) 0.437
                  FilteredOrMany       16.70      (2.3%)       16.64      (2.1%)   -0.3% (  -4% -    4%) 0.621
                AndMedOrHighHigh       88.31      (2.0%)       88.07      (4.7%)   -0.3% (  -6% -    6%) 0.811
                 CountAndHighMed      308.41      (1.1%)      307.65      (1.1%)   -0.2% (  -2% -    1%) 0.479
                 AndHighOrMedMed       51.27      (1.6%)       51.18      (1.2%)   -0.2% (  -2% -    2%) 0.682
      FilteredOr2Terms2StopWords      147.47      (1.0%)      147.30      (1.3%)   -0.1% (  -2% -    2%) 0.750
                FilteredOr3Terms      166.92      (1.2%)      166.80      (1.5%)   -0.1% (  -2% -    2%) 0.859
             FilteredOrStopWords       45.68      (1.6%)       45.65      (2.4%)   -0.1% (  -4% -    4%) 0.933
              FilteredOrHighHigh       67.23      (1.6%)       67.20      (2.1%)   -0.0% (  -3% -    3%) 0.944
                  FilteredPhrase       32.19      (1.9%)       32.19      (2.0%)   -0.0% (  -3% -    3%) 0.992
               FilteredOrHighMed      153.16      (1.2%)      153.25      (1.4%)    0.1% (  -2% -    2%) 0.884
                    CombinedTerm       39.44      (1.0%)       39.50      (1.0%)    0.2% (  -1% -    2%) 0.618
         CountFilteredOrHighHigh      136.54      (1.0%)      136.75      (0.7%)    0.2% (  -1% -    1%) 0.560
              CombinedAndHighMed       89.87      (0.8%)       90.03      (0.8%)    0.2% (  -1% -    1%) 0.509
                 CountOrHighHigh      337.37      (2.3%)      337.96      (1.8%)    0.2% (  -3% -    4%) 0.788
          CountFilteredOrHighMed      148.56      (0.9%)      148.83      (0.7%)    0.2% (  -1% -    1%) 0.460
                     CountOrMany       28.86      (1.8%)       28.92      (1.3%)    0.2% (  -2% -    3%) 0.711
                  FilteredIntNRQ      296.49      (1.2%)      297.05      (1.1%)    0.2% (  -2% -    2%) 0.594
                     CountPhrase        4.23      (2.7%)        4.24      (2.0%)    0.2% (  -4% -    5%) 0.799
                CountAndHighHigh      353.73      (2.0%)      354.52      (1.8%)    0.2% (  -3% -    4%) 0.712
                   TermTitleSort       85.58      (6.8%)       85.78      (6.7%)    0.2% ( -12% -   14%) 0.909
             CountFilteredOrMany       27.01      (1.7%)       27.08      (1.2%)    0.3% (  -2% -    3%) 0.546
               CombinedOrHighMed       88.02      (2.2%)       88.27      (2.7%)    0.3% (  -4% -    5%) 0.717
             CombinedAndHighHigh       23.57      (0.9%)       23.66      (1.0%)    0.4% (  -1% -    2%) 0.193
              FilteredAndHighMed      156.67      (1.4%)      157.50      (3.4%)    0.5% (  -4% -    5%) 0.516
                 FilteredPrefix3      148.36      (2.8%)      149.27      (2.9%)    0.6% (  -4% -    6%) 0.498
               FilteredAnd3Terms      190.43      (1.7%)      191.61      (2.2%)    0.6% (  -3% -    4%) 0.325
                      TermDTSort      383.60      (3.6%)      386.12      (3.3%)    0.7% (  -6% -    7%) 0.547
              CombinedOrHighHigh       23.14      (3.2%)       23.30      (3.9%)    0.7% (  -6% -    8%) 0.533
                   TermMonthSort     3335.51      (1.7%)     3364.42      (2.3%)    0.9% (  -3% -    4%) 0.177
     FilteredAnd2Terms2StopWords      215.80      (1.2%)      218.13      (1.4%)    1.1% (  -1% -    3%) 0.010
                       CountTerm     9268.03      (2.2%)     9382.33      (2.5%)    1.2% (  -3% -    6%) 0.099
            FilteredAndStopWords       64.87      (2.7%)       67.16      (1.9%)    3.5% (  -1% -    8%) 0.000
             FilteredAndHighHigh       78.71      (2.5%)       81.74      (1.6%)    3.9% (   0% -    8%) 0.000

@HUSTERGS
Copy link
Contributor Author

Thank you for running the benchmark !!
It seems the result is not that stable and less promising under different environment : (
Maybe I should dig more about how to optimize the conjunction if we pass the doc buffer all the way down to posting. The new applyAsRequiredClause api alone fails to bring big speedup

@jpountz
Copy link
Contributor

jpountz commented Jul 24, 2025

FWIW I suspect that the speedup to tasks such as FilteredAndStopWords is not an actual speedup, but a side-effect of the fact that we only every wrap RandomQuery within a ConstantScoreQuery. So specializing ConstantScoreQuery#applyAsRequiredClause makes some calls monomorphic, while they map be polymorphic in production systems where ConstantScoreQuery is used to wrap TermQuery and many other queries.

@HUSTERGS
Copy link
Contributor Author

If I understand correctly, removing ConstantScoreQuery#applyAsRequiredClause can help verify this ?

@jpountz
Copy link
Contributor

jpountz commented Jul 25, 2025

Good idea!

@HUSTERGS
Copy link
Contributor Author

I ran the benchmark under identical setup on previous version (which add a new api on PostingEnum), but removed related code under ConstantScoreScorer#applyAsRequiredClause, here is the result:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                DismaxOrHighHigh       35.29      (9.8%)       34.76     (10.2%)   -1.5% ( -19% -   20%) 0.634
                          Phrase        7.49      (6.2%)        7.39      (5.7%)   -1.3% ( -12% -   11%) 0.501
                    CombinedTerm       11.35      (5.3%)       11.21      (6.8%)   -1.3% ( -12% -   11%) 0.517
                       OrHighMed       68.18     (17.4%)       67.53     (17.7%)   -0.9% ( -30% -   41%) 0.864
                      OrHighHigh       21.91     (18.0%)       21.71     (18.0%)   -0.9% ( -31% -   42%) 0.871
                     CountPhrase        2.64      (5.6%)        2.61      (5.2%)   -0.9% ( -11% -   10%) 0.601
             FilteredOrStopWords        7.93      (4.0%)        7.86      (6.3%)   -0.9% ( -10% -    9%) 0.597
              Or2Terms2StopWords       60.82     (13.1%)       60.30     (13.5%)   -0.9% ( -24% -   29%) 0.837
             CountFilteredOrMany        4.34      (5.4%)        4.32      (5.1%)   -0.7% ( -10% -   10%) 0.683
                 AndHighOrMedMed       14.01      (5.0%)       13.92      (4.9%)   -0.7% ( -10% -    9%) 0.678
               CombinedOrHighMed       20.78      (7.3%)       20.65      (8.0%)   -0.6% ( -14% -   15%) 0.796
                     AndHighHigh       22.70     (18.0%)       22.56     (18.4%)   -0.6% ( -31% -   43%) 0.916
                 DismaxOrHighMed       47.41     (12.2%)       47.13     (11.8%)   -0.6% ( -21% -   26%) 0.878
                     CountOrMany        4.89      (6.9%)        4.87      (5.7%)   -0.5% ( -12% -   12%) 0.785
             And2Terms2StopWords       58.89     (12.7%)       58.57     (13.3%)   -0.5% ( -23% -   29%) 0.895
             CountFilteredPhrase        8.96      (6.2%)        8.91      (5.2%)   -0.5% ( -11% -   11%) 0.768
                          OrMany        4.49      (9.3%)        4.47      (8.5%)   -0.5% ( -16% -   18%) 0.858
                CountAndHighHigh       47.65      (4.9%)       47.41      (4.1%)   -0.5% (  -9% -    8%) 0.725
                          Fuzzy1       41.70      (7.3%)       41.51      (6.6%)   -0.4% ( -13% -   14%) 0.839
              CombinedOrHighHigh        5.55      (5.4%)        5.52      (7.0%)   -0.4% ( -12% -   12%) 0.837
             CombinedAndHighHigh        5.60      (5.8%)        5.58      (5.3%)   -0.4% ( -10% -   11%) 0.823
               FilteredAnd3Terms       98.63      (6.1%)       98.27      (5.5%)   -0.4% ( -11% -   11%) 0.843
                          IntSet      291.18      (6.7%)      290.32      (6.0%)   -0.3% ( -12% -   13%) 0.882
              CombinedAndHighMed       20.99      (7.6%)       20.93      (6.6%)   -0.3% ( -13% -   15%) 0.910
                  CountOrHighMed       76.55      (4.4%)       76.38      (3.3%)   -0.2% (  -7% -    7%) 0.860
                      DismaxTerm      488.06      (9.5%)      487.11     (10.9%)   -0.2% ( -18% -   22%) 0.952
                        Or3Terms       64.30     (15.0%)       64.18     (15.6%)   -0.2% ( -26% -   35%) 0.969
                       And3Terms       70.22     (14.3%)       70.10     (15.1%)   -0.2% ( -25% -   34%) 0.971
          CountFilteredOrHighMed       17.78      (2.2%)       17.75      (2.2%)   -0.2% (  -4% -    4%) 0.819
      FilteredOr2Terms2StopWords       49.06      (6.7%)       48.99      (5.9%)   -0.1% ( -11% -   13%) 0.946
                  FilteredIntNRQ       41.04      (5.8%)       40.99      (5.4%)   -0.1% ( -10% -   11%) 0.944
                FilteredOr3Terms       42.88      (5.4%)       42.83      (5.3%)   -0.1% ( -10% -   11%) 0.946
                  FilteredPhrase        9.63      (5.8%)        9.62      (4.8%)   -0.1% ( -10% -   11%) 0.946
                    FilteredTerm       63.25      (5.5%)       63.21      (5.8%)   -0.1% ( -10% -   11%) 0.972
               FilteredOrHighMed       38.02      (5.9%)       38.00      (5.5%)   -0.0% ( -10% -   12%) 0.983
                          Fuzzy2       37.27      (6.6%)       37.25      (6.1%)   -0.0% ( -12% -   13%) 0.987
                 CountOrHighHigh       48.89      (6.1%)       48.89      (3.6%)   -0.0% (  -9% -   10%) 0.998
         CountFilteredOrHighHigh       15.65      (2.9%)       15.66      (2.2%)    0.0% (  -5% -    5%) 0.966
                  FilteredOrMany        3.94      (4.6%)        3.94      (5.1%)    0.1% (  -9% -   10%) 0.974
                      AndHighMed       53.22     (17.1%)       53.27     (17.4%)    0.1% ( -29% -   41%) 0.987
                      TermDTSort      132.49      (5.8%)      132.67      (6.0%)    0.1% ( -11% -   12%) 0.941
                          IntNRQ       41.28      (6.4%)       41.34      (5.9%)    0.1% ( -11% -   13%) 0.943
                IntervalsOrdered        2.37      (5.1%)        2.37      (4.8%)    0.2% (  -9% -   10%) 0.921
             CountFilteredIntNRQ       16.07      (3.0%)       16.09      (2.8%)    0.2% (  -5% -    6%) 0.860
                    SloppyPhrase        1.08      (6.9%)        1.08      (6.4%)    0.2% ( -12% -   14%) 0.923
                          Term1M      454.48     (12.1%)      455.58     (14.4%)    0.2% ( -23% -   30%) 0.954
              FilteredOrHighHigh       12.66      (5.2%)       12.69      (4.6%)    0.2% (  -9% -   10%) 0.876
                         TermB1M      455.08     (11.9%)      456.22     (13.9%)    0.3% ( -22% -   29%) 0.951
                 CountAndHighMed       73.55      (5.1%)       73.75      (3.7%)    0.3% (  -8% -    9%) 0.849
                       TermB1M1P      453.43     (12.2%)      455.20     (14.1%)    0.4% ( -23% -   30%) 0.925
                         Term10K      453.54     (12.3%)      455.78     (13.9%)    0.5% ( -22% -   30%) 0.905
                        SpanNear        2.45      (6.7%)        2.46      (7.1%)    0.5% ( -12% -   15%) 0.816
               TermDayOfYearSort      254.18      (7.0%)      255.72      (6.1%)    0.6% ( -11% -   14%) 0.772
                AndMedOrHighHigh       16.35      (7.9%)       16.46      (8.3%)    0.7% ( -14% -   18%) 0.790
                            Term      452.67     (12.4%)      455.81     (14.0%)    0.7% ( -22% -   30%) 0.868
                         Respell       35.60      (5.5%)       35.85      (4.8%)    0.7% (  -9% -   11%) 0.667
                      OrHighRare       91.77      (8.3%)       92.47      (8.6%)    0.8% ( -14% -   19%) 0.778
                     OrStopWords        9.15     (15.6%)        9.22     (15.7%)    0.8% ( -26% -   38%) 0.864
     FilteredAnd2Terms2StopWords       58.95      (8.8%)       59.47      (9.3%)    0.9% ( -15% -   20%) 0.759
                   TermTitleSort       48.51      (8.0%)       48.95      (8.3%)    0.9% ( -14% -   18%) 0.727
                         Term100      450.57     (13.0%)      456.09     (14.2%)    1.2% ( -22% -   32%) 0.776
                 FilteredPrefix3       67.50      (6.2%)       68.46      (5.2%)    1.4% (  -9% -   13%) 0.428
                        Wildcard       45.25      (6.3%)       45.94      (4.9%)    1.5% (  -9% -   13%) 0.390
                         Prefix3       72.16      (5.9%)       73.29      (5.3%)    1.6% (  -9% -   13%) 0.374
                   TermMonthSort     2010.81      (9.9%)     2053.91     (10.6%)    2.1% ( -16% -   25%) 0.508
                       CountTerm     5564.36     (13.2%)     5695.14     (13.3%)    2.4% ( -21% -   33%) 0.576
              FilteredAndHighMed       30.63      (6.8%)       31.78      (6.8%)    3.8% (  -9% -   18%) 0.080
                    AndStopWords        8.59     (13.0%)        8.95     (13.5%)    4.2% ( -19% -   35%) 0.322
            FilteredAndStopWords        8.16      (5.2%)        8.66      (3.5%)    6.1% (  -2% -   15%) 0.000
             FilteredAndHighHigh       10.07      (6.0%)       10.78      (3.2%)    7.0% (  -2% -   17%) 0.000

It produced similar speedup, but some queries like OrStopWords and AndStopWords do not have performance gain anymore

Maybe the speedup do not worth the complicity of new added apis ? I'm little bit confused

@jpountz
Copy link
Contributor

jpountz commented Aug 8, 2025

I'm confused too, but the current version of the change is quite simple and produces a speedup with a low p-value, so I'm keen on getting it in.

@HUSTERGS
Copy link
Contributor Author

HUSTERGS commented Aug 8, 2025

I suspect there is some connections between #15004 and this PR (there are some overlaps of affected tasks), maybe we should wait for the #15004 being merged into the main branch and compare the performance diff of this PR then ?

@jpountz
Copy link
Contributor

jpountz commented Aug 8, 2025

Good point!

@jpountz
Copy link
Contributor

jpountz commented Aug 8, 2025

Another idea: in order to not add new APIs, an alternative would be to implement specialized bulk scorers for the case when all scorers are term scorers, on the same field (a common case, and arguably the case we're most interested in optimizing) and work directly on ImpactsEnum, norms, and SimScorer. This should allow us to do interesting things without introducing new APIs, such as reading norms only once per doc ID or vectorizing score computations of required/non-essential clauses.

jpountz added a commit to jpountz/luceneutil that referenced this pull request Aug 12, 2025
Because nightly benchmarks only test a small set of scenarios, the JVM may end
up over-optimizing query evaluation. For instance, it only runs with
BM25Similarity, sorting tasks only run against a TermQuery, filtered vector
search only exercises the approximate path, not the exact path, etc.

This tries to make the benchmark more realistic by running some cheap queries
before running bencharks, whose goal is to pollute call sites so that they are
not all magically monomorphic.

This will translate in a drop in performance for some tasks, but hopefully we
can recover some of it in the future.

Related PR:
 - apache/lucene#14968 where we suspected the speedup
   to be due to specialization making a call site monomorphic in nightly
   benchmarks that would not be monomorphic in the real world,
 - apache/lucene#15039 where we are trying to improve
   behavior with several different similarity impls but the benchmarks only
   show a small improvement since they always run with BM25Similarity.
@HUSTERGS
Copy link
Contributor Author

Another idea: in order to not add new APIs, an alternative would be to implement specialized bulk scorers for the case when all scorers are term scorers, on the same field (a common case, and arguably the case we're most interested in optimizing) and work directly on ImpactsEnum, norms, and SimScorer. This should allow us to do interesting things without introducing new APIs, such as reading norms only once per doc ID or vectorizing score computations of required/non-essential clauses.

I'm waiting for #15039 to merge, and looking forward to dig a little bit more about this

I suspect there is some connections between #15004 and this PR (there are some overlaps of affected tasks), maybe we should wait for the #15004 being merged into the main branch and compare the performance diff of this PR then ?

Since #15004 is merged, I ran the benchmark with result below:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                    CombinedTerm       11.25      (3.8%)       11.03      (4.6%)   -1.9% (  -9% -    6%) 0.150
                      OrHighHigh       21.77      (2.1%)       21.50      (2.0%)   -1.3% (  -5% -    2%) 0.050
              Or2Terms2StopWords       59.40      (5.5%)       58.70      (5.1%)   -1.2% ( -11% -   10%) 0.485
                         TermB1M      445.35      (2.8%)      440.99      (2.9%)   -1.0% (  -6% -    4%) 0.284
                     AndHighHigh       22.79      (2.7%)       22.57      (2.2%)   -1.0% (  -5% -    4%) 0.214
                         Term100      445.94      (2.9%)      441.68      (2.8%)   -1.0% (  -6% -    4%) 0.291
                            Term      445.68      (2.9%)      441.44      (3.0%)   -1.0% (  -6% -    5%) 0.303
                         Term10K      445.19      (2.9%)      441.01      (2.8%)   -0.9% (  -6% -    4%) 0.298
                       TermB1M1P      445.76      (2.9%)      441.60      (2.9%)   -0.9% (  -6% -    5%) 0.311
                       And3Terms       72.47      (3.7%)       71.80      (3.5%)   -0.9% (  -7% -    6%) 0.420
                          Term1M      445.53      (2.8%)      441.52      (2.9%)   -0.9% (  -6% -    4%) 0.320
                 FilteredPrefix3       71.42      (3.0%)       70.80      (3.6%)   -0.9% (  -7% -    5%) 0.410
                      OrHighRare       95.32      (5.4%)       94.53      (5.0%)   -0.8% ( -10% -   10%) 0.615
                       OrHighMed       66.68      (4.1%)       66.21      (3.6%)   -0.7% (  -7% -    7%) 0.561
                      AndHighMed       54.25      (3.0%)       53.88      (2.8%)   -0.7% (  -6% -    5%) 0.454
                      DismaxTerm      480.63      (3.5%)      477.44      (3.7%)   -0.7% (  -7% -    6%) 0.556
                    FilteredTerm       63.21      (2.6%)       62.84      (2.4%)   -0.6% (  -5% -    4%) 0.460
                 CountAndHighMed       75.83      (1.8%)       75.42      (2.4%)   -0.5% (  -4% -    3%) 0.426
                        Or3Terms       64.39      (3.7%)       64.07      (3.4%)   -0.5% (  -7% -    6%) 0.648
                DismaxOrHighHigh       35.26      (2.4%)       35.09      (2.6%)   -0.5% (  -5% -    4%) 0.528
                  CountOrHighMed       78.98      (1.7%)       78.64      (1.8%)   -0.4% (  -3% -    3%) 0.447
                         Prefix3       76.18      (3.1%)       75.90      (4.0%)   -0.4% (  -7% -    6%) 0.742
                  FilteredPhrase        9.72      (2.7%)        9.68      (2.2%)   -0.3% (  -5% -    4%) 0.675
                 DismaxOrHighMed       49.38      (3.3%)       49.23      (3.1%)   -0.3% (  -6% -    6%) 0.774
             And2Terms2StopWords       57.85      (5.9%)       57.68      (5.6%)   -0.3% ( -11% -   11%) 0.874
                        Wildcard       47.49      (3.0%)       47.36      (3.4%)   -0.3% (  -6% -    6%) 0.795
                          Phrase        7.53      (3.0%)        7.51      (2.4%)   -0.2% (  -5% -    5%) 0.801
                 AndHighOrMedMed       14.10      (3.4%)       14.08      (3.3%)   -0.2% (  -6% -    6%) 0.864
               FilteredAnd3Terms      104.14      (2.9%)      103.97      (2.3%)   -0.2% (  -5% -    5%) 0.840
                          IntSet      287.42      (4.0%)      286.99      (3.8%)   -0.2% (  -7% -    7%) 0.903
             FilteredOrStopWords        8.14      (2.1%)        8.13      (2.4%)   -0.1% (  -4% -    4%) 0.844
              FilteredOrHighHigh       12.87      (2.7%)       12.86      (2.5%)   -0.1% (  -5% -    5%) 0.875
                          Fuzzy1       39.10      (3.8%)       39.05      (3.2%)   -0.1% (  -6% -    7%) 0.911
               FilteredOrHighMed       38.08      (3.7%)       38.04      (3.2%)   -0.1% (  -6% -    7%) 0.931
                  FilteredIntNRQ       42.38      (2.5%)       42.35      (2.3%)   -0.1% (  -4% -    4%) 0.919
                FilteredOr3Terms       42.96      (3.7%)       42.95      (3.1%)   -0.0% (  -6% -    6%) 0.983
                IntervalsOrdered        2.43      (3.9%)        2.42      (3.3%)   -0.0% (  -6% -    7%) 0.985
      FilteredOr2Terms2StopWords       48.16      (4.5%)       48.17      (4.0%)    0.0% (  -8% -    8%) 0.985
                  FilteredOrMany        3.98      (3.4%)        3.98      (2.7%)    0.1% (  -5% -    6%) 0.949
                          Fuzzy2       35.37      (3.5%)       35.42      (3.1%)    0.1% (  -6% -    7%) 0.905
             CountFilteredIntNRQ       16.31      (1.1%)       16.33      (0.9%)    0.1% (  -1% -    2%) 0.673
                     CountPhrase        2.67      (3.8%)        2.67      (3.4%)    0.2% (  -6% -    7%) 0.870
             CountFilteredPhrase        8.89      (3.3%)        8.91      (3.0%)    0.2% (  -5% -    6%) 0.839
          CountFilteredOrHighMed       17.86      (0.6%)       17.89      (0.5%)    0.2% (   0% -    1%) 0.234
         CountFilteredOrHighHigh       15.78      (0.8%)       15.81      (0.7%)    0.2% (  -1% -    1%) 0.334
                          IntNRQ       42.71      (2.5%)       42.80      (2.2%)    0.2% (  -4% -    5%) 0.766
     FilteredAnd2Terms2StopWords       59.46      (4.6%)       59.66      (4.3%)    0.3% (  -8% -    9%) 0.810
                 CountOrHighHigh       50.23      (2.1%)       50.41      (2.0%)    0.4% (  -3% -    4%) 0.558
               CombinedOrHighMed       20.51      (4.4%)       20.59      (5.0%)    0.4% (  -8% -   10%) 0.799
                     CountOrMany        4.93      (3.3%)        4.95      (3.2%)    0.5% (  -5% -    7%) 0.653
                          OrMany        4.55      (5.4%)        4.57      (4.8%)    0.5% (  -9% -   11%) 0.770
              CombinedAndHighMed       20.75      (4.2%)       20.86      (4.2%)    0.5% (  -7% -    9%) 0.690
                CountAndHighHigh       48.78      (1.9%)       49.08      (1.8%)    0.6% (  -2% -    4%) 0.295
                         Respell       35.79      (4.3%)       36.05      (2.5%)    0.7% (  -5% -    7%) 0.519
              CombinedOrHighHigh        5.65      (3.3%)        5.69      (3.7%)    0.8% (  -6% -    8%) 0.492
             CountFilteredOrMany        4.35      (2.6%)        4.39      (2.6%)    0.8% (  -4% -    6%) 0.332
                       CountTerm     5812.39      (2.7%)     5862.14      (2.9%)    0.9% (  -4% -    6%) 0.335
                    SloppyPhrase        1.14      (4.5%)        1.15      (4.8%)    0.9% (  -8% -   10%) 0.538
             CombinedAndHighHigh        5.71      (1.7%)        5.76      (1.8%)    1.0% (  -2% -    4%) 0.075
                AndMedOrHighHigh       16.62      (3.2%)       16.78      (3.2%)    1.0% (  -5% -    7%) 0.316
                        SpanNear        2.48      (5.2%)        2.51      (5.3%)    1.0% (  -8% -   12%) 0.538
                    AndStopWords        9.11      (3.0%)        9.31      (1.9%)    2.2% (  -2% -    7%) 0.006
              FilteredAndHighMed       31.76      (2.6%)       32.53      (1.6%)    2.4% (  -1% -    6%) 0.000
                     OrStopWords        9.17      (1.9%)        9.39      (3.1%)    2.5% (  -2% -    7%) 0.002
            FilteredAndStopWords        8.57      (2.8%)        8.80      (1.3%)    2.7% (  -1% -    6%) 0.000
             FilteredAndHighHigh       10.61      (2.6%)       10.92      (1.0%)    2.9% (   0% -    6%) 0.000

I'm planning to do another round of benchmark after mikemccand/luceneutil#436 is merged, maybe the speedup is not real ?

jpountz added a commit to mikemccand/luceneutil that referenced this pull request Aug 13, 2025
Because nightly benchmarks only test a small set of scenarios, the JVM may end
up over-optimizing query evaluation. For instance, it only runs with
BM25Similarity, sorting tasks only run against a TermQuery, filtered vector
search only exercises the approximate path, not the exact path, etc.

This tries to make the benchmark more realistic by running some cheap queries
before running bencharks, whose goal is to pollute call sites so that they are
not all magically monomorphic.

This will translate in a drop in performance for some tasks, but hopefully we
can recover some of it in the future.

Related PR:
 - apache/lucene#14968 where we suspected the speedup
   to be due to specialization making a call site monomorphic in nightly
   benchmarks that would not be monomorphic in the real world,
 - apache/lucene#15039 where we are trying to improve
   behavior with several different similarity impls but the benchmarks only
   show a small improvement since they always run with BM25Similarity.
@jpountz
Copy link
Contributor

jpountz commented Aug 18, 2025

I'm curious if this change helps more with type pollution, especially if we start using BulkSimScorer to compute scores. On some local testing that I did, the slowdown of type pollution was much less impactful if I removed usage of BooleanSimilarity and ClassicSimilarity.

@HUSTERGS
Copy link
Contributor Author

I'd run another benchmark tonight

@HUSTERGS
Copy link
Contributor Author

Here is the benchmark result:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                        Or3Terms       62.01      (7.9%)       59.63      (7.8%)   -3.8% ( -18% -   12%) 0.124
             And2Terms2StopWords       53.62      (7.6%)       51.80      (7.9%)   -3.4% ( -17% -   13%) 0.164
              Or2Terms2StopWords       55.25      (7.8%)       53.42      (7.4%)   -3.3% ( -17% -   12%) 0.167
                       OrHighMed       62.68      (8.7%)       60.91      (9.1%)   -2.8% ( -18% -   16%) 0.314
               CombinedOrHighMed       19.63      (5.0%)       19.11      (4.1%)   -2.6% ( -11% -    6%) 0.067
                    CombinedTerm       10.94      (3.9%)       10.68      (3.3%)   -2.3% (  -9% -    5%) 0.041
                 DismaxOrHighMed       46.53      (5.8%)       45.48      (6.6%)   -2.3% ( -13% -   10%) 0.251
      FilteredOr2Terms2StopWords       46.40      (5.6%)       45.36      (4.9%)   -2.3% ( -12% -    8%) 0.174
                      OrHighHigh       21.22      (8.6%)       20.78      (7.4%)   -2.0% ( -16% -   15%) 0.418
              CombinedAndHighMed       19.98      (5.1%)       19.59      (4.6%)   -2.0% ( -11% -    8%) 0.201
                       And3Terms       67.80      (6.8%)       66.48      (7.9%)   -2.0% ( -15% -   13%) 0.403
                      AndHighMed       50.89      (7.5%)       49.93      (8.5%)   -1.9% ( -16% -   15%) 0.455
               FilteredOrHighMed       36.49      (4.7%)       35.80      (4.3%)   -1.9% ( -10% -    7%) 0.183
                          OrMany        4.26      (5.2%)        4.18      (5.7%)   -1.8% ( -12% -    9%) 0.286
             CountFilteredPhrase        8.69      (3.9%)        8.54      (3.5%)   -1.7% (  -8% -    5%) 0.137
                FilteredOr3Terms       41.43      (4.6%)       40.71      (4.3%)   -1.7% ( -10% -    7%) 0.218
                DismaxOrHighHigh       33.88      (5.6%)       33.32      (4.9%)   -1.7% ( -11% -    9%) 0.317
                      TermDTSort      138.70      (3.6%)      136.45      (2.9%)   -1.6% (  -7% -    5%) 0.114
                      OrHighRare       91.17      (8.3%)       89.75      (6.4%)   -1.6% ( -14% -   14%) 0.505
     FilteredAnd2Terms2StopWords       56.26      (5.4%)       55.44      (5.5%)   -1.5% ( -11% -    9%) 0.394
                          IntNRQ       42.87      (2.8%)       42.27      (2.5%)   -1.4% (  -6% -    3%) 0.094
             CountFilteredOrMany        4.31      (2.8%)        4.26      (2.7%)   -1.4% (  -6% -    4%) 0.117
              FilteredOrHighHigh       12.43      (3.5%)       12.26      (3.2%)   -1.3% (  -7% -    5%) 0.202
                          Fuzzy1       38.01      (4.1%)       37.51      (3.9%)   -1.3% (  -8% -    6%) 0.294
               FilteredAnd3Terms      100.43      (4.3%)       99.15      (4.3%)   -1.3% (  -9% -    7%) 0.348
                     CountOrMany        4.85      (3.1%)        4.79      (2.9%)   -1.3% (  -7% -    4%) 0.191
                          Phrase        7.43      (2.3%)        7.34      (2.3%)   -1.2% (  -5% -    3%) 0.098
             FilteredOrStopWords        7.89      (2.9%)        7.79      (2.5%)   -1.2% (  -6% -    4%) 0.149
                          Fuzzy2       34.43      (4.1%)       34.01      (3.6%)   -1.2% (  -8% -    6%) 0.319
                  FilteredPhrase        9.46      (3.2%)        9.34      (3.0%)   -1.2% (  -7% -    5%) 0.215
                  FilteredIntNRQ       42.46      (3.0%)       41.95      (2.5%)   -1.2% (  -6% -    4%) 0.166
                  FilteredOrMany        3.89      (4.4%)        3.85      (4.2%)   -1.1% (  -9% -    7%) 0.421
                          IntSet      288.72      (3.3%)      285.58      (4.1%)   -1.1% (  -8% -    6%) 0.354
                     AndHighHigh       21.70      (7.9%)       21.48      (7.6%)   -1.0% ( -15% -   15%) 0.682
                  CountOrHighMed       72.18      (2.9%)       71.49      (2.7%)   -1.0% (  -6% -    4%) 0.281
                CountAndHighHigh       48.37      (2.3%)       47.96      (1.8%)   -0.9% (  -4% -    3%) 0.193
                     OrStopWords        9.09      (7.2%)        9.01      (5.9%)   -0.8% ( -13% -   13%) 0.685
                 CountOrHighHigh       49.44      (2.3%)       49.05      (2.0%)   -0.8% (  -4% -    3%) 0.250
                   TermMonthSort     2049.26      (2.3%)     2033.92      (1.6%)   -0.7% (  -4% -    3%) 0.236
                 CountAndHighMed       71.19      (3.4%)       70.66      (3.4%)   -0.7% (  -7% -    6%) 0.487
                IntervalsOrdered        2.42      (4.0%)        2.41      (3.9%)   -0.7% (  -8% -    7%) 0.579
                       CountTerm     5445.21      (3.0%)     5412.75      (2.6%)   -0.6% (  -6% -    5%) 0.504
               TermDayOfYearSort      252.90      (2.1%)      251.51      (1.7%)   -0.6% (  -4% -    3%) 0.363
                     CountPhrase        2.60      (4.4%)        2.59      (4.2%)   -0.5% (  -8% -    8%) 0.685
                    FilteredTerm       60.57      (3.1%)       60.25      (3.2%)   -0.5% (  -6% -    5%) 0.597
              CombinedOrHighHigh        5.53      (3.0%)        5.51      (3.0%)   -0.5% (  -6% -    5%) 0.601
                        SpanNear        2.46      (4.7%)        2.45      (4.7%)   -0.5% (  -9% -    9%) 0.750
                 AndHighOrMedMed       13.68      (3.0%)       13.61      (2.7%)   -0.5% (  -5% -    5%) 0.599
             CountFilteredIntNRQ       16.38      (1.1%)       16.31      (1.0%)   -0.4% (  -2% -    1%) 0.200
                    SloppyPhrase        1.11      (3.7%)        1.10      (4.1%)   -0.4% (  -7% -    7%) 0.753
         CountFilteredOrHighHigh       15.80      (0.8%)       15.74      (0.8%)   -0.4% (  -1% -    1%) 0.138
                         Respell       34.87      (3.8%)       34.76      (3.5%)   -0.3% (  -7% -    7%) 0.782
          CountFilteredOrHighMed       17.88      (0.7%)       17.83      (0.7%)   -0.2% (  -1% -    1%) 0.281
                        Wildcard       46.82      (3.1%)       46.85      (3.0%)    0.0% (  -5% -    6%) 0.960
                         Prefix3       73.54      (4.0%)       73.65      (3.9%)    0.1% (  -7% -    8%) 0.905
             CombinedAndHighHigh        5.64      (1.5%)        5.66      (1.3%)    0.4% (  -2% -    3%) 0.399
                 FilteredPrefix3       68.41      (3.9%)       68.70      (4.0%)    0.4% (  -7% -    8%) 0.737
                      DismaxTerm      467.61      (5.2%)      472.49      (2.6%)    1.0% (  -6% -    9%) 0.421
                   TermTitleSort       50.61      (3.5%)       51.50      (4.0%)    1.8% (  -5% -    9%) 0.140
                       TermB1M1P      427.71      (7.1%)      435.69      (4.1%)    1.9% (  -8% -   14%) 0.313
                          Term1M      428.15      (7.3%)      436.17      (4.1%)    1.9% (  -8% -   14%) 0.315
                         Term10K      428.30      (7.3%)      436.53      (4.2%)    1.9% (  -8% -   14%) 0.305
                         Term100      428.01      (7.2%)      436.45      (4.1%)    2.0% (  -8% -   14%) 0.290
                         TermB1M      427.84      (7.1%)      436.30      (4.1%)    2.0% (  -8% -   14%) 0.281
                            Term      428.31      (7.2%)      436.83      (4.1%)    2.0% (  -8% -   14%) 0.282
              FilteredAndHighMed       30.59      (3.5%)       31.20      (3.7%)    2.0% (  -5% -    9%) 0.077
                AndMedOrHighHigh       15.02      (3.6%)       15.34      (3.9%)    2.1% (  -5% -    9%) 0.076
                    AndStopWords        8.32      (5.4%)        8.54      (5.1%)    2.7% (  -7% -   14%) 0.101
            FilteredAndStopWords        8.33      (3.0%)        8.58      (1.5%)    3.0% (  -1% -    7%) 0.000
             FilteredAndHighHigh       10.36      (3.1%)       10.70      (1.4%)    3.3% (  -1% -    8%) 0.000

@jpountz
Copy link
Contributor

jpountz commented Aug 19, 2025

It's a bit disappointing, but I haven't seen a big speedup with the introduction of BulkSimScorer either, maybe it's because we'll only see benefits when we also start using BulkSimScorer to compute impact scores.

@HUSTERGS
Copy link
Contributor Author

It's a bit disappointing, but I haven't seen a big speedup with the introduction of BulkSimScorer either, maybe it's because we'll only see benefits when we also start using BulkSimScorer to compute impact scores.

If I understand correctly, the BulkSimScorer need a pair of raw int[] and float[] to work ? And impact scores seems to be computed with an array of Impact ? Is this somehow related to #14931 to make it work?

@jpountz
Copy link
Contributor

jpountz commented Aug 21, 2025

Possibly, I'm not sure about what the API should look like. I started playing with replacing List<Impact> with FreqAndNormBuffer (akin to DocAndFloatFeatureBuffer) but it is quite an invasive change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants