Skip to content

Add estimatedByteSizes to merges kicked off by IndexWriter.addIndexes(CodecReader[]) #15120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Aug 27, 2025

Conversation

cwperks
Copy link
Contributor

@cwperks cwperks commented Aug 25, 2025

Description

This PR abstracts the logic to add estimatedByteSizes on merges to a method called addEstimatedBytesToMerge so that it can be re-used in multiple places.

This PR adds a simple test to ensure that estimated bytes are added to the merge after addEstimatedBytesToMerge(MergePolicy.OneMerge merge) is called

The test can be run with ./gradlew :lucene:core:test --tests TestIndexWriterMerging.testAddEstimatedBytesToMerge -i

Resolves #14993

Signed-off-by: Craig Perkins <[email protected]>
Signed-off-by: Craig Perkins <[email protected]>
Signed-off-by: Craig Perkins <[email protected]>
Copy link
Contributor

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

Signed-off-by: Craig Perkins <[email protected]>
@github-actions github-actions bot added this to the 11.0.0 milestone Aug 25, 2025
Signed-off-by: Craig Perkins <[email protected]>
Signed-off-by: Craig Perkins <[email protected]>
Signed-off-by: Craig Perkins <[email protected]>
@msokolov
Copy link
Contributor

so this moves the accounting from deep in registerMerge to the top? Did the test fail before this change?

@cwperks
Copy link
Contributor Author

cwperks commented Aug 25, 2025

so this moves the accounting from deep in registerMerge to the top? Did the test fail before this change?

@msokolov merges registered in public void registerMerge(MergePolicy.OneMerge merge) did not get the estimated bytes added like those registered in private synchronized boolean registerMerge(MergePolicy.OneMerge merge) do. This leads to information missing in log statements like the one's pointed out in the issue description.

I added a new unit test for the new method testAddEstimatedBytesToMerge, but I think it may also be good to ensure merges get this information regardless of how they're registered in an integration test. Do you know the proper place to add a test like that?

@msokolov
Copy link
Contributor

Lucene doesn't really separate unit tests from integration tests, because it doesn't ship a service. But you can write unit tests at different levels of abstraction. I think what you've done here (create an index, add some docs, trigger merges, check the output) is about what we'd expect. But if you want to also test the size estimation of merges kicked off by "internal" policies, maybe the best way is to use RandomIndexWriter, which will sometimes randomly kick off some merges of its own? TBH this isn't an area of the code I know super well, and I commented mostly to understand what was going on!

Signed-off-by: Craig Perkins <[email protected]>
Comment on lines 490 to 491
writer.close();
dir.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A try-with-resources block on both, perhaps? This is important for clean up if a test fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to try-with-resources here.

Do you think we should create a mechanism to enforce this pattern repo wide? I see a number of tests cases (included ones in this suite) that do not follow the try-with-resources pattern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The project has 20+ years of history, this isn't surprising. I think we should fix these when we touch the code involved - one huge patch is probably not worth the effort here. New code should stick to new conventions (and utilities) available in the language.

try {
addEstimatedBytesToMerge(merge);
} catch (IOException _) {
// ignore and append to pending merges
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would throw an UncheckedIOException here. If something is wrong, it'll be wrong. Let the caller know early.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to throw UncheckedIOException

@@ -4777,6 +4782,21 @@ private void abortOneMerge(MergePolicy.OneMerge merge) throws IOException {
closeMergeReaders(merge, true, false);
}

/** Compute {@code estimatedMergeBytes} and {@code totalMergeBytes} for a merge. */
void addEstimatedBytesToMerge(MergePolicy.OneMerge merge) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this method be static too? I wonder if it uses any IW's state at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to make it static, but I was not able to because it calls numDeletedDocs() which is non-static. When I tried to make numDeletedDocs() static I ran into other issues. I've been trying to contribute more to Lucene lately, but still very much coming up to speed on the repo so opted to start small. Do you know if numDeletedDocs() and any other methods it calls can be changed to static?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let it stay as it is now then.

Signed-off-by: Craig Perkins <[email protected]>
@dweiss dweiss modified the milestones: 11.0.0, 10.3.0 Aug 26, 2025
@dweiss dweiss self-assigned this Aug 26, 2025
@dweiss dweiss merged commit bfdbac3 into apache:main Aug 27, 2025
8 checks passed
dweiss added a commit that referenced this pull request Aug 27, 2025
dweiss added a commit that referenced this pull request Aug 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Merges kicked off by IndexWriter.addIndexes(CodecReader[]) fail to set estimated size
3 participants