-
Notifications
You must be signed in to change notification settings - Fork 1k
perf: add repeat_slice_n_times to MutableBuffer
#8658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
this will be used in: - apache#8653
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @rluvaton
My only concern with this approach is that it might not be necessary (aka did you compare it to the simpler strategy)?
| } | ||
| } | ||
|
|
||
| /// Adding to this mutable buffer `slice_to_repeat` repeated `repeat_count` times. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering how much the unsafe log copying here makes a difference, vs ensuring reserve is called correctly.
Did you measure with code that was like:
reserve(slice.len() * repeat_count);
for _ in 0..repeat_count {
buf.extend_from_slice(slice_to_repeat)
}|
|
||
| unsafe { | ||
| // Get to the start of the data before we started copying anything | ||
| let src = self.data.as_ptr().add(length_before) as *const u8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rustc can probably figure it out, but src is the same for all loop iterations so could be pulled out of the loop I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I thought about it but decided not to do it so the code for src and dst is close
|
Yes, I committed the benchmark I tested with
these are the results: Result |
👍 |
# Which issue does this PR close? N/A # Rationale for this change doing `OffsetBuffer::from_lengths(std::iter::repeat_n(size, value.len()));` does not utilize SIMD (I explain further if you want) See [GodBolt Link](https://godbolt.org/z/PTsfvfjqx) Extracted from: - #8653 After this and the pr below is merged will improve the datafusion scalar to array to use this and make it really really fast: - #8658 # What changes are included in this PR? added new function # Are these changes tested? yes # Are there any user-facing changes? yes
|
@alamb can you please merge |
|
Done |
Which issue does this PR close?
N/A
Rationale for this change
I want to repeat the same value multiple times in a very fast way
which will be used in:
After this and the pr below is merged will improve the datafusion scalar to array to use this and make it really really fast:
What changes are included in this PR?
Created a function in
MutableBufferto repeat a slice a number of times in a logarithmic way to reduce memcopy callsAre these changes tested?
Yes
Are there any user-facing changes?
Yes, and added docs
Extracted from:
Benchmark results on local machine
these are the results:
Result