Skip to content

Commit 233dad3

Browse files
authored
Optimize partition_validity function used in sort kernels (apache#7937)
# Which issue does this PR close? Optimize `partition_validity` function used in sort kernels - Preallocate vectors based on known null counts - Avoid dynamic dispatch by calling `NullBuffer::is_valid` instead of `Array::is_valid` - Avoid capacity checks inside loop by writing to `spare_capacity_mut` instead of using `push` - Closes apache#7936. # Rationale for this change Microbenchmark results for `sort_kernels` compared to `main`, only looking at benchmarks matching "nulls to indices": ``` sort i32 nulls to indices 2^10 time: [4.9325 µs 4.9370 µs 4.9422 µs] change: [−20.303% −20.133% −19.974%] (p = 0.00 < 0.05) Performance has improved. sort i32 nulls to indices 2^12 time: [20.096 µs 20.209 µs 20.327 µs] change: [−26.819% −26.275% −25.697%] (p = 0.00 < 0.05) Performance has improved. sort f32 nulls to indices 2^12 time: [26.329 µs 26.366 µs 26.406 µs] change: [−29.487% −29.331% −29.146%] (p = 0.00 < 0.05) Performance has improved. sort string[0-10] nulls to indices 2^12 time: [70.667 µs 70.762 µs 70.886 µs] change: [−20.057% −19.935% −19.819%] (p = 0.00 < 0.05) Performance has improved. sort string[0-100] nulls to indices 2^12 time: [101.98 µs 102.44 µs 102.99 µs] change: [−0.3501% +0.0835% +0.4918%] (p = 0.71 > 0.05) No change in performance detected. sort string[0-400] nulls to indices 2^12 time: [84.952 µs 85.024 µs 85.102 µs] change: [−5.3969% −4.9827% −4.6421%] (p = 0.00 < 0.05) Performance has improved. sort string[10] nulls to indices 2^12 time: [72.486 µs 72.664 µs 72.893 µs] change: [−14.937% −14.781% −14.599%] (p = 0.00 < 0.05) Performance has improved. sort string[100] nulls to indices 2^12 time: [71.354 µs 71.606 µs 71.902 µs] change: [−17.207% −16.795% −16.373%] (p = 0.00 < 0.05) Performance has improved. sort string[1000] nulls to indices 2^12 time: [73.088 µs 73.195 µs 73.311 µs] change: [−16.705% −16.599% −16.483%] (p = 0.00 < 0.05) Performance has improved. sort string_view[10] nulls to indices 2^12 time: [32.592 µs 32.654 µs 32.731 µs] change: [−15.722% −15.512% −15.310%] (p = 0.00 < 0.05) Performance has improved. sort string_view[0-400] nulls to indices 2^12 time: [32.981 µs 33.074 µs 33.189 µs] change: [−25.570% −25.132% −24.700%] (p = 0.00 < 0.05) Performance has improved. sort string_view_inlined[0-12] nulls to indices 2^12 time: [28.467 µs 28.496 µs 28.529 µs] change: [−22.978% −22.786% −22.574%] (p = 0.00 < 0.05) Performance has improved. sort string[10] dict nulls to indices 2^12 time: [94.463 µs 94.503 µs 94.542 µs] change: [−11.386% −11.165% −10.961%] (p = 0.00 < 0.05) Performance has improved. ``` # Are these changes tested? Covered by existing tests # Are there any user-facing changes? No, the method is internal to the sort kernels.
1 parent d0fa24e commit 233dad3

File tree

1 file changed

+34
-6
lines changed

1 file changed

+34
-6
lines changed

arrow-ord/src/sort.rs

Lines changed: 34 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -180,13 +180,41 @@ where
180180

181181
// partition indices into valid and null indices
182182
fn partition_validity(array: &dyn Array) -> (Vec<u32>, Vec<u32>) {
183-
match array.null_count() {
184-
// faster path
185-
0 => ((0..(array.len() as u32)).collect(), vec![]),
186-
_ => {
187-
let indices = 0..(array.len() as u32);
188-
indices.partition(|index| array.is_valid(*index as usize))
183+
let len = array.len();
184+
let null_count = array.null_count();
185+
match array.nulls() {
186+
Some(nulls) if null_count > 0 => {
187+
let mut valid_indices = Vec::with_capacity(len - null_count);
188+
let mut null_indices = Vec::with_capacity(null_count);
189+
190+
let valid_slice = valid_indices.spare_capacity_mut();
191+
let null_slice = null_indices.spare_capacity_mut();
192+
let mut valid_idx = 0;
193+
let mut null_idx = 0;
194+
195+
nulls.into_iter().enumerate().for_each(|(i, v)| {
196+
if v {
197+
valid_slice[valid_idx].write(i as u32);
198+
valid_idx += 1;
199+
} else {
200+
null_slice[null_idx].write(i as u32);
201+
null_idx += 1;
202+
}
203+
});
204+
205+
assert_eq!(null_idx, null_count);
206+
assert_eq!(valid_idx, len - null_count);
207+
// Safety: The new lengths match the initial capacity as asserted above,
208+
// the bounds checks while writing also ensure they less than or equal to the capacity.
209+
unsafe {
210+
valid_indices.set_len(valid_idx);
211+
null_indices.set_len(null_idx);
212+
}
213+
214+
(valid_indices, null_indices)
189215
}
216+
// faster path
217+
_ => ((0..(len as u32)).collect(), vec![]),
190218
}
191219
}
192220

0 commit comments

Comments
 (0)