-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add estimatedByteSizes to merges kicked off by IndexWriter.addIndexes(CodecReader[]) #15120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…(CodecReader[]) Signed-off-by: Craig Perkins <[email protected]>
Signed-off-by: Craig Perkins <[email protected]>
Signed-off-by: Craig Perkins <[email protected]>
Signed-off-by: Craig Perkins <[email protected]>
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
Signed-off-by: Craig Perkins <[email protected]>
Signed-off-by: Craig Perkins <[email protected]>
Signed-off-by: Craig Perkins <[email protected]>
Signed-off-by: Craig Perkins <[email protected]>
so this moves the accounting from deep in |
@msokolov merges registered in I added a new unit test for the new method |
Lucene doesn't really separate unit tests from integration tests, because it doesn't ship a service. But you can write unit tests at different levels of abstraction. I think what you've done here (create an index, add some docs, trigger merges, check the output) is about what we'd expect. But if you want to also test the size estimation of merges kicked off by "internal" policies, maybe the best way is to use RandomIndexWriter, which will sometimes randomly kick off some merges of its own? TBH this isn't an area of the code I know super well, and I commented mostly to understand what was going on! |
Signed-off-by: Craig Perkins <[email protected]>
writer.close(); | ||
dir.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A try-with-resources block on both, perhaps? This is important for clean up if a test fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to try-with-resources here.
Do you think we should create a mechanism to enforce this pattern repo wide? I see a number of tests cases (included ones in this suite) that do not follow the try-with-resources pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The project has 20+ years of history, this isn't surprising. I think we should fix these when we touch the code involved - one huge patch is probably not worth the effort here. New code should stick to new conventions (and utilities) available in the language.
try { | ||
addEstimatedBytesToMerge(merge); | ||
} catch (IOException _) { | ||
// ignore and append to pending merges |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would throw an UncheckedIOException here. If something is wrong, it'll be wrong. Let the caller know early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to throw UncheckedIOException
@@ -4777,6 +4782,21 @@ private void abortOneMerge(MergePolicy.OneMerge merge) throws IOException { | |||
closeMergeReaders(merge, true, false); | |||
} | |||
|
|||
/** Compute {@code estimatedMergeBytes} and {@code totalMergeBytes} for a merge. */ | |||
void addEstimatedBytesToMerge(MergePolicy.OneMerge merge) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this method be static too? I wonder if it uses any IW's state at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to make it static, but I was not able to because it calls numDeletedDocs()
which is non-static. When I tried to make numDeletedDocs()
static I ran into other issues. I've been trying to contribute more to Lucene lately, but still very much coming up to speed on the repo so opted to start small. Do you know if numDeletedDocs()
and any other methods it calls can be changed to static?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let it stay as it is now then.
Signed-off-by: Craig Perkins <[email protected]>
…(CodecReader[]) (backport of #15120).
Description
This PR abstracts the logic to add estimatedByteSizes on merges to a method called
addEstimatedBytesToMerge
so that it can be re-used in multiple places.This PR adds a simple test to ensure that estimated bytes are added to the merge after
addEstimatedBytesToMerge(MergePolicy.OneMerge merge)
is calledThe test can be run with
./gradlew :lucene:core:test --tests TestIndexWriterMerging.testAddEstimatedBytesToMerge -i
Resolves #14993