-
Notifications
You must be signed in to change notification settings - Fork 132
Pollute call sites before running benchmarks. #436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Because nightly benchmarks only test a small set of scenarios, the JVM may end up over-optimizing query evaluation. For instance, it only runs with BM25Similarity, sorting tasks only run against a TermQuery, filtered vector search only exercises the approximate path, not the exact path, etc. This tries to make the benchmark more realistic by running some cheap queries before running bencharks, whose goal is to pollute call sites so that they are not all magically monomorphic. This will translate in a drop in performance for some tasks, but hopefully we can recover some of it in the future. Related PR: - apache/lucene#14968 where we suspected the speedup to be due to specialization making a call site monomorphic in nightly benchmarks that would not be monomorphic in the real world, - apache/lucene#15039 where we are trying to improve behavior with several different similarity impls but the benchmarks only show a small improvement since they always run with BM25Similarity.
Here's the result of a run where pollution is disabled on the baseline and enabled on the competitor:
|
@ChrisHegarty I think I remember seeing something like that in one of your recent PRs but I can't find it anymore? |
Yeah, I had something similar in the benchmark update of this PR apache/lucene#15037. I still need to make it optional, so it can be enabled or not for comparison. Generally, I do think that this is a good idea, as it will allow us to find such potential problems so that we can fix 'em and make performance more consistent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jpountz -- this is a great idea to make benchy more real-world realistic.
@@ -0,0 +1,174 @@ | |||
package perf; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs ASL copyright header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for noticing, I added one.
final TestContext testContext = TestContext.parse(args.getString("-context", "")); | ||
|
||
if (pollute) { | ||
TypePolluter.pollute(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious that the one-time pollution is enough! Hotspot doesn't noticed that things later got singular and then re-optimize?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I'm not intimate enough with Hotspot to give you an answer. I suspect that it technically could, but that it wouldn't help that much in real-world applications, so it doesn't bother. @ChrisHegarty may have more data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that what's in the PR is fine. It is possible that things change over time and that Hostpot could potentially optimise differently in the future when profiles change, but like Adrien, I'm less worried about this in real world scenarios.
I ran experiments locally that suggest that some of the performance decrease from type pollution (mikemccand/luceneutil#436) can be attributed to calls to `SimScorer#score` no longer being inlinable since they are polymorphic. This change helps `BM25Scorer` remain inlinable using similar tricks that we are applying for `Bits#get` and `ImpactsEnum#nextDoc`/`ImpactsEnum#advance`. Hopefully changes such as apache#15039 will help improve performance with other similarities as well in the future.
I pushed an annotation for this change. |
Because nightly benchmarks only test a small set of scenarios, the JVM may end up over-optimizing query evaluation. For instance, it only runs with BM25Similarity, sorting tasks only run against a TermQuery, filtered vector search only exercises the approximate path, not the exact path, etc.
This tries to make the benchmark more realistic by running some cheap queries before running bencharks, whose goal is to pollute call sites so that they are not all magically monomorphic.
This will translate in a drop in performance for some tasks, but hopefully we can recover some of it in the future.
Related PR:
Scorer#applyAsRequiredClause
apache/lucene#14968 where we suspected the speedup to be due to specialization making a call site monomorphic in nightly benchmarks that would not be monomorphic in the real world,