-
Notifications
You must be signed in to change notification settings - Fork 978
Perf: optimize actual_buffer_size to use only data buffer capacity for coalesce #7967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
But the test failed, i am not sure if this is a reasonable optimization. 🤔 |
I iwill review this shortly. Thank you @zhuqi-lucas |
Thank you @alamb , fixed the tests also. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @zhuqi-lucas -- I double checked and indeed since we are compacting the buffers, it makes sense to use the buffer size rather than the total memory size (that also includes the views and null buffers)
🤖 |
Co-authored-by: Andrew Lamb <[email protected]>
Thank you @alamb for review and double checking! |
🤖: Benchmark completed Details
|
🤖 |
🤖: Benchmark completed Details
|
🤖 |
This is matching my local result, 2x faster for high select! filter: single_utf8view, 8192, nulls: 0, selectivity: 0.8 2.62 3.7±0.02ms ? ?/sec 1.00 1408.1±4.77µs ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.001 1.00 90.0±0.27ms ? ?/sec 1.01 90.8±0.23ms ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.01 1.00 11.2±0.03ms ? ?/sec 1.01 11.3±0.04ms ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.1 1.00 5.6±0.34ms ? ?/sec 1.06 6.0±0.41ms ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.8 1.63 6.2±0.02ms ? ?/sec 1.00 3.8±0.01ms ? ?/sec
|
🤖: Benchmark completed Details
|
Thanks again @zhuqi-lucas |
Which issue does this PR close?
This is a very interesting idea that we only calculate the data buffer size when we choose to gc, because we almost only care about the gc for data buffers, not for other field views/nulls.
GC is only for databuffers, so the *2 calculation should also compare the databuffer size?
Rationale for this change
optimize actual_buffer_size to use only data buffer capacity
What changes are included in this PR?
optimize actual_buffer_size to use only data buffer capacity
Are these changes tested?
The performance improvement for some high select benchmark with low null ratio is very good about 2X fast:
If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
Are there any user-facing changes?
If there are user-facing changes then we may require documentation to be updated before approving the PR.
If there are any breaking changes to public APIs, please call them out.