Perf: optimize actual_buffer_size to use only data buffer capacity for coalesce #7967

zhuqi-lucas · 2025-07-20T15:23:40Z

Which issue does this PR close?

This is a very interesting idea that we only calculate the data buffer size when we choose to gc, because we almost only care about the gc for data buffers, not for other field views/nulls.

GC is only for databuffers, so the *2 calculation should also compare the databuffer size?

Rationale for this change

optimize actual_buffer_size to use only data buffer capacity

What changes are included in this PR?

optimize actual_buffer_size to use only data buffer capacity

Are these changes tested?

The performance improvement for some high select benchmark with low null ratio is very good about 2X fast:

cargo bench --bench coalesce_kernels "single_utf8view"
   Compiling arrow-select v55.2.0 (/Users/zhuqi/arrow-rs/arrow-select)
   Compiling arrow-cast v55.2.0 (/Users/zhuqi/arrow-rs/arrow-cast)
   Compiling arrow-string v55.2.0 (/Users/zhuqi/arrow-rs/arrow-string)
   Compiling arrow-ord v55.2.0 (/Users/zhuqi/arrow-rs/arrow-ord)
   Compiling arrow-csv v55.2.0 (/Users/zhuqi/arrow-rs/arrow-csv)
   Compiling arrow-json v55.2.0 (/Users/zhuqi/arrow-rs/arrow-json)
   Compiling arrow v55.2.0 (/Users/zhuqi/arrow-rs/arrow)
    Finished `bench` profile [optimized] target(s) in 13.26s
     Running benches/coalesce_kernels.rs (target/release/deps/coalesce_kernels-bb9750abedb10ad6)
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.001
                        time:   [30.946 ms 31.071 ms 31.193 ms]
                        change: [−1.7086% −1.1581% −0.6036%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) low mild
  1 (1.00%) high mild

filter: single_utf8view, 8192, nulls: 0, selectivity: 0.01
                        time:   [3.8178 ms 3.8311 ms 3.8444 ms]
                        change: [−4.0521% −3.5467% −3.0345%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild

Benchmarking filter: single_utf8view, 8192, nulls: 0, selectivity: 0.1: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.9s, enable flat sampling, or reduce sample count to 40.
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.1
                        time:   [1.9337 ms 1.9406 ms 1.9478 ms]
                        change: [+0.3699% +0.9557% +1.5666%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  3 (3.00%) high severe

filter: single_utf8view, 8192, nulls: 0, selectivity: 0.8
                        time:   [797.60 µs 805.31 µs 813.85 µs]
                        change: [−59.177% −58.412% −57.639%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.001
                        time:   [43.742 ms 43.924 ms 44.108 ms]
                        change: [−1.2146% −0.5778% +0.0828%] (p = 0.08 > 0.05)
                        No change in performance detected.

filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.01
                        time:   [5.5736 ms 5.5987 ms 5.6247 ms]
                        change: [−0.2381% +0.4740% +1.1711%] (p = 0.18 > 0.05)
                        No change in performance detected.

filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.1
                        time:   [2.2963 ms 2.3035 ms 2.3109 ms]
                        change: [−0.9314% −0.5125% −0.0931%] (p = 0.02 < 0.05)
                        Change within noise threshold.

Benchmarking filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.8: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.1s, enable flat sampling, or reduce sample count to 50.
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.8
                        time:   [1.5482 ms 1.5697 ms 1.5903 ms]
                        change: [−45.794% −44.386% −43.000%] (p = 0.00 < 0.05)
                        Performance has improved.

If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?

Are there any user-facing changes?

If there are user-facing changes then we may require documentation to be updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.

zhuqi-lucas · 2025-07-20T15:54:47Z

But the test failed, i am not sure if this is a reasonable optimization. 🤔

alamb · 2025-07-21T23:03:57Z

I iwill review this shortly. Thank you @zhuqi-lucas

zhuqi-lucas · 2025-07-22T05:37:20Z

Thank you @alamb , fixed the tests also.

alamb

Thank you @zhuqi-lucas -- I double checked and indeed since we are compacting the buffers, it makes sense to use the buffer size rather than the total memory size (that also includes the views and null buffers)

arrow-select/src/coalesce/byte_view.rs

alamb · 2025-07-23T10:54:25Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing optimize_actual_buffer_size (444ad37) to 82821e5 diff
BENCH_NAME=concatenate_kernel
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench concatenate_kernel
BENCH_FILTER=
BENCH_BRANCH_NAME=optimize_actual_buffer_size
Results will be posted here when complete

Co-authored-by: Andrew Lamb <[email protected]>

zhuqi-lucas · 2025-07-23T11:01:01Z

Thank you @zhuqi-lucas -- I double checked and indeed since we are compacting the buffers, it makes sense to use the buffer size rather than the total memory size (that also includes the views and null buffers)

Thank you @alamb for review and double checking!

alamb · 2025-07-23T11:03:37Z

🤖: Benchmark completed

Details

group                                                          main                                   optimize_actual_buffer_size
-----                                                          ----                                   ---------------------------
concat 1024 arrays boolean 4                                   1.01     28.0±0.13µs        ? ?/sec    1.00     27.8±0.04µs        ? ?/sec
concat 1024 arrays i32 4                                       1.01     14.2±0.09µs        ? ?/sec    1.00     14.1±0.06µs        ? ?/sec
concat 1024 arrays str 4                                       1.01     56.9±0.44µs        ? ?/sec    1.00     56.1±0.56µs        ? ?/sec
concat boolean 1024                                            1.30    451.8±0.45ns        ? ?/sec    1.00    348.2±0.36ns        ? ?/sec
concat boolean 8192 over 100 arrays                            1.00     51.0±0.08µs        ? ?/sec    1.00     50.9±0.07µs        ? ?/sec
concat boolean nulls 1024                                      1.16    786.7±4.89ns        ? ?/sec    1.00    680.5±1.17ns        ? ?/sec
concat boolean nulls 8192 over 100 arrays                      1.00    109.6±0.19µs        ? ?/sec    1.00    109.7±0.17µs        ? ?/sec
concat fixed size lists                                        1.00   709.2±32.96µs        ? ?/sec    1.00   707.8±32.05µs        ? ?/sec
concat i32 1024                                                1.00    437.1±1.21ns        ? ?/sec    1.00    437.0±0.89ns        ? ?/sec
concat i32 8192 over 100 arrays                                1.00    218.1±9.45µs        ? ?/sec    1.01    220.9±5.36µs        ? ?/sec
concat i32 nulls 1024                                          1.13    760.3±2.11ns        ? ?/sec    1.00    675.6±4.53ns        ? ?/sec
concat i32 nulls 8192 over 100 arrays                          1.01    283.1±7.66µs        ? ?/sec    1.00    279.3±9.60µs        ? ?/sec
concat str 1024                                                1.00     13.5±1.09µs        ? ?/sec    1.02     13.8±1.24µs        ? ?/sec
concat str 8192 over 100 arrays                                1.00    107.8±1.04ms        ? ?/sec    1.00    107.8±0.84ms        ? ?/sec
concat str nulls 1024                                          1.00      6.6±0.52µs        ? ?/sec    1.02      6.7±0.84µs        ? ?/sec
concat str nulls 8192 over 100 arrays                          1.01     53.9±0.46ms        ? ?/sec    1.00     53.4±0.37ms        ? ?/sec
concat str_dict 1024                                           1.07      3.0±0.01µs        ? ?/sec    1.00      2.8±0.02µs        ? ?/sec
concat str_dict_sparse 1024                                    1.01      7.0±0.02µs        ? ?/sec    1.00      7.0±0.03µs        ? ?/sec
concat struct with int32 and dicts size=1024 count=2           1.00      6.7±0.15µs        ? ?/sec    1.00      6.7±0.12µs        ? ?/sec
concat utf8_view  max_str_len=128 null_density=0               1.01     77.8±0.34µs        ? ?/sec    1.00     77.3±0.38µs        ? ?/sec
concat utf8_view  max_str_len=128 null_density=0.2             1.00     84.1±0.43µs        ? ?/sec    1.00     84.2±0.39µs        ? ?/sec
concat utf8_view  max_str_len=20 null_density=0                1.01     77.9±0.39µs        ? ?/sec    1.00     77.5±0.38µs        ? ?/sec
concat utf8_view  max_str_len=20 null_density=0.2              1.00     84.2±0.40µs        ? ?/sec    1.00     83.9±0.39µs        ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0      1.00     40.9±4.43µs        ? ?/sec    1.13     46.2±3.80µs        ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0.2    1.00     46.9±3.28µs        ? ?/sec    1.20     56.0±5.19µs        ? ?/sec

alamb · 2025-07-23T11:03:40Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing optimize_actual_buffer_size (2df2acf) to 82821e5 diff
BENCH_NAME=coalesce_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench coalesce_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=optimize_actual_buffer_size
Results will be posted here when complete

alamb · 2025-07-23T11:23:01Z

🤖: Benchmark completed

Details

group                                                                                main                                   optimize_actual_buffer_size
-----                                                                                ----                                   ---------------------------
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.001                               1.10    288.0±2.78ms        ? ?/sec    1.00    261.5±1.46ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.01                                1.01      8.7±0.07ms        ? ?/sec    1.00      8.6±0.08ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.1                                 1.00      4.1±0.05ms        ? ?/sec    1.01      4.2±0.13ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.8                                 1.00      3.5±0.02ms        ? ?/sec    1.01      3.6±0.03ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.001                             1.29    318.3±2.55ms        ? ?/sec    1.00    246.7±1.72ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.01                              1.01     10.1±0.08ms        ? ?/sec    1.00     10.0±0.07ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.1                               1.04      4.7±0.09ms        ? ?/sec    1.00      4.6±0.08ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.8                               1.01      4.6±0.02ms        ? ?/sec    1.00      4.5±0.02ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.001                               1.02     58.8±0.36ms        ? ?/sec    1.00     57.5±0.49ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.01                                1.00     11.9±0.11ms        ? ?/sec    1.00     11.9±0.16ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.1                                 1.00      9.8±0.16ms        ? ?/sec    1.01     10.0±0.35ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.8                                 1.27     10.9±0.24ms        ? ?/sec    1.00      8.5±0.23ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.001                             1.04     81.7±0.32ms        ? ?/sec    1.00     78.8±0.33ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.01                              1.00     13.9±0.08ms        ? ?/sec    1.00     13.9±0.14ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.1                               1.00     10.3±0.17ms        ? ?/sec    1.00     10.4±0.27ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.8                               1.01     10.1±0.18ms        ? ?/sec    1.00     10.0±0.20ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.001      1.06     49.0±0.24ms        ? ?/sec    1.00     46.4±0.18ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.01       1.00      5.9±0.03ms        ? ?/sec    1.00      5.9±0.03ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.1        1.00      4.6±0.22ms        ? ?/sec    1.00      4.5±0.09ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.8        1.02      3.2±0.05ms        ? ?/sec    1.00      3.2±0.04ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.001    1.03     62.8±0.31ms        ? ?/sec    1.00     61.2±0.20ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.01     1.00      8.7±0.03ms        ? ?/sec    1.00      8.7±0.09ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.1      1.00      5.6±0.21ms        ? ?/sec    1.00      5.6±0.17ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.8      1.00      3.8±0.01ms        ? ?/sec    1.01      3.8±0.02ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.001       1.05     42.2±0.11ms        ? ?/sec    1.00     40.3±0.09ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.01        1.00      4.6±0.02ms        ? ?/sec    1.00      4.6±0.02ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.1         1.09      2.5±0.20ms        ? ?/sec    1.00      2.3±0.02ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.8         1.48      2.4±0.01ms        ? ?/sec    1.00   1590.2±8.61µs        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.001     1.05     56.6±0.45ms        ? ?/sec    1.00     54.1±0.15ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.01      1.00      7.5±0.03ms        ? ?/sec    1.00      7.6±0.03ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.1       1.00      3.8±0.15ms        ? ?/sec    1.01      3.8±0.18ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.8       1.20      4.5±0.01ms        ? ?/sec    1.00      3.8±0.01ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.001                                1.06     97.8±0.23ms        ? ?/sec    1.00     92.2±0.14ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.01                                 1.01      9.3±0.02ms        ? ?/sec    1.00      9.2±0.04ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.1                                  1.00      3.9±0.40ms        ? ?/sec    1.01      3.9±0.15ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.8                                  1.00      3.3±0.01ms        ? ?/sec    1.01      3.3±0.01ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.001                              1.04    139.0±0.32ms        ? ?/sec    1.00    133.5±1.27ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.01                               1.02     16.8±0.05ms        ? ?/sec    1.00     16.5±0.05ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.1                                1.03      7.6±0.47ms        ? ?/sec    1.00      7.4±0.16ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.8                                1.00      8.8±0.02ms        ? ?/sec    1.01      8.9±0.03ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.001                          1.00     65.4±0.13ms        ? ?/sec    1.00     65.3±0.14ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.01                           1.00      7.4±0.03ms        ? ?/sec    1.00      7.5±0.02ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.1                            1.00      3.9±0.40ms        ? ?/sec    1.09      4.3±0.40ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.8                            2.62      3.7±0.02ms        ? ?/sec    1.00   1408.1±4.77µs        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.001                        1.00     90.0±0.27ms        ? ?/sec    1.01     90.8±0.23ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.01                         1.00     11.2±0.03ms        ? ?/sec    1.01     11.3±0.04ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.1                          1.00      5.6±0.34ms        ? ?/sec    1.06      6.0±0.41ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.8                          1.63      6.2±0.02ms        ? ?/sec    1.00      3.8±0.01ms        ? ?/sec

alamb · 2025-07-23T11:23:05Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing optimize_actual_buffer_size (2df2acf) to 82821e5 diff
BENCH_NAME=filter_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench filter_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=optimize_actual_buffer_size
Results will be posted here when complete

zhuqi-lucas · 2025-07-23T11:27:10Z

This is matching my local result, 2x faster for high select!

filter: single_utf8view, 8192, nulls: 0, selectivity: 0.8                            2.62      3.7±0.02ms        ? ?/sec    1.00   1408.1±4.77µs        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.001                        1.00     90.0±0.27ms        ? ?/sec    1.01     90.8±0.23ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.01                         1.00     11.2±0.03ms        ? ?/sec    1.01     11.3±0.04ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.1                          1.00      5.6±0.34ms        ? ?/sec    1.06      6.0±0.41ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.8                          1.63      6.2±0.02ms        ? ?/sec    1.00      3.8±0.01ms        ? ?/sec

🤖: Benchmark completed

Details

alamb · 2025-07-23T11:46:13Z

🤖: Benchmark completed

Details

group                                                                         main                                   optimize_actual_buffer_size
-----                                                                         ----                                   ---------------------------
filter context decimal128 (kept 1/2)                                          1.07     43.2±6.35µs        ? ?/sec    1.00     40.5±0.25µs        ? ?/sec
filter context decimal128 high selectivity (kept 1023/1024)                   1.02     50.8±1.81µs        ? ?/sec    1.00     50.0±1.33µs        ? ?/sec
filter context decimal128 low selectivity (kept 1/1024)                       1.00    242.9±0.40ns        ? ?/sec    1.06    258.2±0.48ns        ? ?/sec
filter context f32 (kept 1/2)                                                 1.00     98.3±3.10µs        ? ?/sec    1.08    106.4±0.19µs        ? ?/sec
filter context f32 high selectivity (kept 1023/1024)                          1.01     13.6±0.48µs        ? ?/sec    1.00     13.5±0.40µs        ? ?/sec
filter context f32 low selectivity (kept 1/1024)                              1.00    481.3±0.58ns        ? ?/sec    1.05    506.5±0.48ns        ? ?/sec
filter context fsb with value length 20 (kept 1/2)                            1.00     70.7±0.10µs        ? ?/sec    1.12     79.5±0.15µs        ? ?/sec
filter context fsb with value length 20 high selectivity (kept 1023/1024)     1.00     70.7±0.13µs        ? ?/sec    1.12     79.4±0.10µs        ? ?/sec
filter context fsb with value length 20 low selectivity (kept 1/1024)         1.00     70.7±0.11µs        ? ?/sec    1.12     79.4±0.16µs        ? ?/sec
filter context fsb with value length 5 (kept 1/2)                             1.00     70.8±0.11µs        ? ?/sec    1.12     79.6±0.17µs        ? ?/sec
filter context fsb with value length 5 high selectivity (kept 1023/1024)      1.00     70.8±0.16µs        ? ?/sec    1.12     79.5±0.18µs        ? ?/sec
filter context fsb with value length 5 low selectivity (kept 1/1024)          1.00     70.7±0.12µs        ? ?/sec    1.12     79.4±0.15µs        ? ?/sec
filter context fsb with value length 50 (kept 1/2)                            1.00     70.7±0.11µs        ? ?/sec    1.12     79.4±0.16µs        ? ?/sec
filter context fsb with value length 50 high selectivity (kept 1023/1024)     1.00     70.8±0.45µs        ? ?/sec    1.12     79.4±0.12µs        ? ?/sec
filter context fsb with value length 50 low selectivity (kept 1/1024)         1.00     70.7±0.13µs        ? ?/sec    1.12     79.4±0.11µs        ? ?/sec
filter context i32 (kept 1/2)                                                 1.00     22.7±0.06µs        ? ?/sec    1.01     22.9±0.09µs        ? ?/sec
filter context i32 high selectivity (kept 1023/1024)                          1.03      6.3±0.38µs        ? ?/sec    1.00      6.1±0.31µs        ? ?/sec
filter context i32 low selectivity (kept 1/1024)                              1.00    235.8±0.21ns        ? ?/sec    1.07    253.2±2.41ns        ? ?/sec
filter context i32 w NULLs (kept 1/2)                                         1.00     94.0±0.22µs        ? ?/sec    1.09    102.6±0.31µs        ? ?/sec
filter context i32 w NULLs high selectivity (kept 1023/1024)                  1.00     13.4±0.42µs        ? ?/sec    1.00     13.4±0.64µs        ? ?/sec
filter context i32 w NULLs low selectivity (kept 1/1024)                      1.00    474.0±0.77ns        ? ?/sec    1.04    493.0±0.78ns        ? ?/sec
filter context mixed string view (kept 1/2)                                   1.00    119.4±7.64µs        ? ?/sec    1.02    121.5±2.96µs        ? ?/sec
filter context mixed string view high selectivity (kept 1023/1024)            1.02     60.4±1.58µs        ? ?/sec    1.00     59.3±1.12µs        ? ?/sec
filter context mixed string view low selectivity (kept 1/1024)                1.00    677.4±0.98ns        ? ?/sec    1.02    692.6±1.38ns        ? ?/sec
filter context short string view (kept 1/2)                                   1.00    116.0±7.55µs        ? ?/sec    1.03    119.4±1.53µs        ? ?/sec
filter context short string view high selectivity (kept 1023/1024)            1.00     58.1±1.06µs        ? ?/sec    1.01     58.5±1.21µs        ? ?/sec
filter context short string view low selectivity (kept 1/1024)                1.00    498.3±0.60ns        ? ?/sec    1.02    508.3±0.67ns        ? ?/sec
filter context string (kept 1/2)                                              1.00    576.8±9.46µs        ? ?/sec    1.05   607.3±15.07µs        ? ?/sec
filter context string dictionary (kept 1/2)                                   1.00     23.6±0.05µs        ? ?/sec    1.00     23.6±0.16µs        ? ?/sec
filter context string dictionary high selectivity (kept 1023/1024)            1.02      7.6±0.42µs        ? ?/sec    1.00      7.5±0.40µs        ? ?/sec
filter context string dictionary low selectivity (kept 1/1024)                1.00    825.2±4.78ns        ? ?/sec    1.02    844.9±1.21ns        ? ?/sec
filter context string dictionary w NULLs (kept 1/2)                           1.00     94.7±0.32µs        ? ?/sec    1.09    103.3±0.24µs        ? ?/sec
filter context string dictionary w NULLs high selectivity (kept 1023/1024)    1.02     14.4±0.36µs        ? ?/sec    1.00     14.2±0.46µs        ? ?/sec
filter context string dictionary w NULLs low selectivity (kept 1/1024)        1.00   1082.9±1.22ns        ? ?/sec    1.02   1105.3±6.76ns        ? ?/sec
filter context string high selectivity (kept 1023/1024)                       1.00   643.5±11.98µs        ? ?/sec    1.03   660.2±17.53µs        ? ?/sec
filter context string low selectivity (kept 1/1024)                           1.15   1113.6±8.28ns        ? ?/sec    1.00    972.0±3.99ns        ? ?/sec
filter context u8 (kept 1/2)                                                  1.00     22.4±0.04µs        ? ?/sec    1.00     22.4±0.06µs        ? ?/sec
filter context u8 high selectivity (kept 1023/1024)                           1.00  1834.8±10.36ns        ? ?/sec    1.00   1842.3±8.69ns        ? ?/sec
filter context u8 low selectivity (kept 1/1024)                               1.00    231.6±0.31ns        ? ?/sec    1.08    249.5±0.76ns        ? ?/sec
filter context u8 w NULLs (kept 1/2)                                          1.00     93.4±0.13µs        ? ?/sec    1.09    102.2±0.19µs        ? ?/sec
filter context u8 w NULLs high selectivity (kept 1023/1024)                   1.00      8.6±0.02µs        ? ?/sec    1.00      8.6±0.03µs        ? ?/sec
filter context u8 w NULLs low selectivity (kept 1/1024)                       1.00    567.0±0.90ns        ? ?/sec    1.04    587.4±1.29ns        ? ?/sec
filter decimal128 (kept 1/2)                                                  1.07     53.4±3.93µs        ? ?/sec    1.00     50.1±2.47µs        ? ?/sec
filter decimal128 high selectivity (kept 1023/1024)                           1.02     54.1±1.96µs        ? ?/sec    1.00     53.0±1.14µs        ? ?/sec
filter decimal128 low selectivity (kept 1/1024)                               1.00      2.9±0.01µs        ? ?/sec    1.00      2.9±0.01µs        ? ?/sec
filter f32 (kept 1/2)                                                         1.00    121.5±0.32µs        ? ?/sec    1.00    122.0±0.48µs        ? ?/sec
filter fsb with value length 20 (kept 1/2)                                    1.01    145.6±0.49µs        ? ?/sec    1.00    144.8±0.42µs        ? ?/sec
filter fsb with value length 20 high selectivity (kept 1023/1024)             1.00     69.6±1.01µs        ? ?/sec    1.02     70.8±1.91µs        ? ?/sec
filter fsb with value length 20 low selectivity (kept 1/1024)                 1.00      3.4±0.01µs        ? ?/sec    1.00      3.4±0.01µs        ? ?/sec
filter fsb with value length 5 (kept 1/2)                                     1.00    151.0±0.51µs        ? ?/sec    1.00    150.9±0.65µs        ? ?/sec
filter fsb with value length 5 high selectivity (kept 1023/1024)              1.00     11.3±0.67µs        ? ?/sec    1.00     11.3±0.62µs        ? ?/sec
filter fsb with value length 5 low selectivity (kept 1/1024)                  1.00      3.4±0.01µs        ? ?/sec    1.00      3.4±0.01µs        ? ?/sec
filter fsb with value length 50 (kept 1/2)                                    1.02   181.2±10.43µs        ? ?/sec    1.00   177.5±14.44µs        ? ?/sec
filter fsb with value length 50 high selectivity (kept 1023/1024)             1.00    205.6±6.26µs        ? ?/sec    1.00    204.6±9.58µs        ? ?/sec
filter fsb with value length 50 low selectivity (kept 1/1024)                 1.00      3.4±0.01µs        ? ?/sec    1.00      3.4±0.01µs        ? ?/sec
filter i32 (kept 1/2)                                                         1.00     45.5±0.05µs        ? ?/sec    1.00     45.3±0.11µs        ? ?/sec
filter i32 high selectivity (kept 1023/1024)                                  1.00      8.6±0.34µs        ? ?/sec    1.02      8.8±0.43µs        ? ?/sec
filter i32 low selectivity (kept 1/1024)                                      1.03      3.1±0.01µs        ? ?/sec    1.00      3.0±0.01µs        ? ?/sec
filter optimize (kept 1/2)                                                    1.00     53.8±0.05µs        ? ?/sec    1.00     53.9±0.07µs        ? ?/sec
filter optimize high selectivity (kept 1023/1024)                             1.00      2.6±0.01µs        ? ?/sec    1.06      2.7±0.01µs        ? ?/sec
filter optimize low selectivity (kept 1/1024)                                 1.22      2.8±0.01µs        ? ?/sec    1.00      2.3±0.00µs        ? ?/sec
filter run array (kept 1/2)                                                   1.09    424.5±0.71µs        ? ?/sec    1.00    388.6±0.61µs        ? ?/sec
filter run array high selectivity (kept 1023/1024)                            1.08    448.9±1.58µs        ? ?/sec    1.00    414.0±1.26µs        ? ?/sec
filter run array low selectivity (kept 1/1024)                                1.11    334.4±1.14µs        ? ?/sec    1.00    301.0±0.81µs        ? ?/sec
filter single record batch                                                    1.00     45.6±0.09µs        ? ?/sec    1.01     46.1±0.13µs        ? ?/sec
filter u8 (kept 1/2)                                                          1.00     45.2±0.07µs        ? ?/sec    1.02     46.0±0.05µs        ? ?/sec
filter u8 high selectivity (kept 1023/1024)                                   1.00      3.7±0.02µs        ? ?/sec    1.02      3.8±0.02µs        ? ?/sec
filter u8 low selectivity (kept 1/1024)                                       1.00      2.9±0.01µs        ? ?/sec    1.02      3.0±0.02µs        ? ?/sec

alamb · 2025-07-23T17:06:51Z

Thanks again @zhuqi-lucas

Perf: optimize actual_buffer_size to use only data buffer capacity

bdd2896

github-actions bot added the arrow Changes to the arrow crate label Jul 20, 2025

fmt

dac68f3

zhuqi-lucas changed the title ~~Perf: optimize actual_buffer_size to use only data buffer capacity~~ Perf: optimize actual_buffer_size to use only data buffer capacity for coalesce Jul 20, 2025

zhuqi-lucas mentioned this pull request Jul 20, 2025

[EPIC] A collection of improvement for the performance for sort and compare and gc, etc #7802

Open

12 tasks

fix test

d1cf410

fix

444ad37

alamb approved these changes Jul 23, 2025

View reviewed changes

arrow-select/src/coalesce/byte_view.rs Outdated Show resolved Hide resolved

Update arrow-select/src/coalesce/byte_view.rs

2df2acf

Co-authored-by: Andrew Lamb <[email protected]>

alamb merged commit 3e089d2 into apache:main Jul 23, 2025
26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Perf: optimize actual_buffer_size to use only data buffer capacity for coalesce #7967

Perf: optimize actual_buffer_size to use only data buffer capacity for coalesce #7967

Uh oh!

zhuqi-lucas commented Jul 20, 2025 •

edited

Loading

Uh oh!

zhuqi-lucas commented Jul 20, 2025

Uh oh!

alamb commented Jul 21, 2025

Uh oh!

zhuqi-lucas commented Jul 22, 2025

Uh oh!

alamb left a comment

Uh oh!

Uh oh!

alamb commented Jul 23, 2025

Uh oh!

zhuqi-lucas commented Jul 23, 2025

Uh oh!

alamb commented Jul 23, 2025

Uh oh!

alamb commented Jul 23, 2025

Uh oh!

alamb commented Jul 23, 2025

Uh oh!

alamb commented Jul 23, 2025

Uh oh!

zhuqi-lucas commented Jul 23, 2025

Uh oh!

alamb commented Jul 23, 2025

Uh oh!

Uh oh!

alamb commented Jul 23, 2025

Uh oh!

Uh oh!

Perf: optimize actual_buffer_size to use only data buffer capacity for coalesce #7967

Perf: optimize actual_buffer_size to use only data buffer capacity for coalesce #7967

Uh oh!

Conversation

zhuqi-lucas commented Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

zhuqi-lucas commented Jul 20, 2025

Uh oh!

alamb commented Jul 21, 2025

Uh oh!

zhuqi-lucas commented Jul 22, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alamb commented Jul 23, 2025

Uh oh!

zhuqi-lucas commented Jul 23, 2025

Uh oh!

alamb commented Jul 23, 2025

Uh oh!

alamb commented Jul 23, 2025

Uh oh!

alamb commented Jul 23, 2025

Uh oh!

alamb commented Jul 23, 2025

Uh oh!

zhuqi-lucas commented Jul 23, 2025

Uh oh!

alamb commented Jul 23, 2025

Uh oh!

Uh oh!

alamb commented Jul 23, 2025

Uh oh!

Uh oh!

zhuqi-lucas commented Jul 20, 2025 •

edited

Loading