Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Nov 13, 2025

WIP: I am verifying this PR is faster via benchmarks and if so I will polish is up and mark it ready for review

Which issue does this PR close?

Rationale for this change

This is part of a broader goal to consolidate the bitwise operations in DataFusion so that we can focus additional optimization energy on them (aka bithacks for the win)

set_bits was introduced by @kazuyukitanimura in

What changes are included in this PR?

  • Deprecate set_bits with a note to use apply_bitwise_binary_op instead.
  • Update any internal uses of set_bits to use apply_bitwise_binary_op instead.
  • Update set_bits to use apply_bitwise_binary_op internally

Are these changes tested?

Covered by existing tests

I will also performance test

Are there any user-facing changes?

If there are user-facing changes then we may require documentation to be updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.

@alamb
Copy link
Contributor Author

alamb commented Nov 13, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/set_bits (4a951a3) to f8d9572 diff
BENCH_NAME=bit_mask
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench bit_mask
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_set_bits
Results will be posted here when complete

@github-actions github-actions bot added the arrow Changes to the arrow crate label Nov 13, 2025
@alamb
Copy link
Contributor Author

alamb commented Nov 13, 2025

🤖: Benchmark completed

Details

group                                                              alamb_set_bits                         main
-----                                                              --------------                         ----
bit_mask/set_bits/offset_write_0_offset_read_0_len_17_datum_0      1.31     25.3±0.30ns        ? ?/sec    1.00     19.3±0.05ns        ? ?/sec
bit_mask/set_bits/offset_write_0_offset_read_0_len_17_datum_173    1.31     25.2±0.09ns        ? ?/sec    1.00     19.3±0.04ns        ? ?/sec
bit_mask/set_bits/offset_write_0_offset_read_0_len_1_datum_0       2.86     20.7±0.08ns        ? ?/sec    1.00      7.2±0.05ns        ? ?/sec
bit_mask/set_bits/offset_write_0_offset_read_0_len_1_datum_173     2.86     20.6±0.09ns        ? ?/sec    1.00      7.2±0.02ns        ? ?/sec
bit_mask/set_bits/offset_write_0_offset_read_0_len_65_datum_0      2.15     23.9±0.09ns        ? ?/sec    1.00     11.1±0.03ns        ? ?/sec
bit_mask/set_bits/offset_write_0_offset_read_0_len_65_datum_173    2.15     23.9±0.06ns        ? ?/sec    1.00     11.1±0.02ns        ? ?/sec
bit_mask/set_bits/offset_write_0_offset_read_5_len_17_datum_0      1.31     25.3±0.67ns        ? ?/sec    1.00     19.3±0.01ns        ? ?/sec
bit_mask/set_bits/offset_write_0_offset_read_5_len_17_datum_173    1.31     25.2±0.05ns        ? ?/sec    1.00     19.3±0.02ns        ? ?/sec
bit_mask/set_bits/offset_write_0_offset_read_5_len_1_datum_0       2.86     20.6±0.07ns        ? ?/sec    1.00      7.2±0.09ns        ? ?/sec
bit_mask/set_bits/offset_write_0_offset_read_5_len_1_datum_173     2.86     20.6±0.08ns        ? ?/sec    1.00      7.2±0.01ns        ? ?/sec
bit_mask/set_bits/offset_write_0_offset_read_5_len_65_datum_0      1.17     25.1±0.05ns        ? ?/sec    1.00     21.5±0.03ns        ? ?/sec
bit_mask/set_bits/offset_write_0_offset_read_5_len_65_datum_173    1.17     25.1±0.06ns        ? ?/sec    1.00     21.5±0.02ns        ? ?/sec
bit_mask/set_bits/offset_write_5_offset_read_0_len_17_datum_0      1.58     30.3±0.06ns        ? ?/sec    1.00     19.3±0.02ns        ? ?/sec
bit_mask/set_bits/offset_write_5_offset_read_0_len_17_datum_173    1.58     30.4±0.08ns        ? ?/sec    1.00     19.3±0.02ns        ? ?/sec
bit_mask/set_bits/offset_write_5_offset_read_0_len_1_datum_0       2.14     15.5±0.03ns        ? ?/sec    1.00      7.2±0.04ns        ? ?/sec
bit_mask/set_bits/offset_write_5_offset_read_0_len_1_datum_173     2.15     15.5±0.02ns        ? ?/sec    1.00      7.2±0.01ns        ? ?/sec
bit_mask/set_bits/offset_write_5_offset_read_0_len_65_datum_0      1.70     35.5±0.05ns        ? ?/sec    1.00     20.9±0.03ns        ? ?/sec
bit_mask/set_bits/offset_write_5_offset_read_0_len_65_datum_173    1.70     35.5±0.07ns        ? ?/sec    1.00     20.9±0.06ns        ? ?/sec
bit_mask/set_bits/offset_write_5_offset_read_5_len_17_datum_0      1.51     29.1±0.07ns        ? ?/sec    1.00     19.3±0.08ns        ? ?/sec
bit_mask/set_bits/offset_write_5_offset_read_5_len_17_datum_173    1.51     29.1±0.11ns        ? ?/sec    1.00     19.3±0.04ns        ? ?/sec
bit_mask/set_bits/offset_write_5_offset_read_5_len_1_datum_0       2.14     15.5±0.05ns        ? ?/sec    1.00      7.2±0.06ns        ? ?/sec
bit_mask/set_bits/offset_write_5_offset_read_5_len_1_datum_173     2.14     15.5±0.02ns        ? ?/sec    1.00      7.2±0.07ns        ? ?/sec
bit_mask/set_bits/offset_write_5_offset_read_5_len_65_datum_0      1.70     35.6±0.09ns        ? ?/sec    1.00     21.0±0.07ns        ? ?/sec
bit_mask/set_bits/offset_write_5_offset_read_5_len_65_datum_173    1.70     35.6±0.09ns        ? ?/sec    1.00     21.0±0.03ns        ? ?/sec

@alamb
Copy link
Contributor Author

alamb commented Nov 13, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/set_bits (9ec9c34) to f8d9572 diff
BENCH_NAME=boolean_append_packed
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench boolean_append_packed
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_set_bits
Results will be posted here when complete

@alamb
Copy link
Contributor Author

alamb commented Nov 13, 2025

🤖: Benchmark completed

Details

group                    alamb_set_bits                         main
-----                    --------------                         ----
boolean_append_packed    1.00      5.3±0.06µs        ? ?/sec    1.21      6.4±0.02µs        ? ?/sec

@alamb
Copy link
Contributor Author

alamb commented Nov 13, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/set_bits (9ec9c34) to f8d9572 diff
BENCH_NAME=concatenate_kernel
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench concatenate_kernel
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_set_bits
Results will be posted here when complete

@alamb
Copy link
Contributor Author

alamb commented Nov 13, 2025

🤖: Benchmark completed

Details

group                                                          alamb_set_bits                         main
-----                                                          --------------                         ----
concat 1024 arrays boolean 4                                   1.00     21.9±0.07µs        ? ?/sec    1.03     22.5±0.10µs        ? ?/sec
concat 1024 arrays i32 4                                       1.00     14.5±0.03µs        ? ?/sec    1.00     14.6±0.05µs        ? ?/sec
concat 1024 arrays str 4                                       1.01     37.1±0.28µs        ? ?/sec    1.00     36.8±0.30µs        ? ?/sec
concat boolean 1024                                            1.00    308.7±0.37ns        ? ?/sec    1.11    343.0±0.33ns        ? ?/sec
concat boolean 8192 over 100 arrays                            1.00      5.1±0.04µs        ? ?/sec    1.01      5.1±0.02µs        ? ?/sec
concat boolean nulls 1024                                      1.01    563.1±0.59ns        ? ?/sec    1.00    556.5±0.66ns        ? ?/sec
concat boolean nulls 8192 over 100 arrays                      1.00     18.2±0.07µs        ? ?/sec    1.00     18.2±0.05µs        ? ?/sec
concat fixed size lists                                        1.01   737.0±17.00µs        ? ?/sec    1.00   726.6±20.78µs        ? ?/sec
concat i32 1024                                                1.01    390.4±1.99ns        ? ?/sec    1.00    386.0±0.60ns        ? ?/sec
concat i32 8192 over 100 arrays                                1.00    203.3±9.36µs        ? ?/sec    1.03    208.9±4.05µs        ? ?/sec
concat i32 nulls 1024                                          1.00    600.8±4.50ns        ? ?/sec    1.02    615.2±1.52ns        ? ?/sec
concat i32 nulls 8192 over 100 arrays                          1.00   234.7±10.03µs        ? ?/sec    1.04   243.9±13.07µs        ? ?/sec
concat str 1024                                                1.00     13.0±0.94µs        ? ?/sec    1.04     13.5±1.02µs        ? ?/sec
concat str 8192 over 100 arrays                                1.00    103.9±1.03ms        ? ?/sec    1.01    104.7±1.04ms        ? ?/sec
concat str nulls 1024                                          1.07      5.9±0.70µs        ? ?/sec    1.00      5.5±0.79µs        ? ?/sec
concat str nulls 8192 over 100 arrays                          1.00     52.4±0.59ms        ? ?/sec    1.01     53.0±0.50ms        ? ?/sec
concat str_dict 1024                                           1.00      2.8±0.01µs        ? ?/sec    1.08      3.0±0.01µs        ? ?/sec
concat str_dict_sparse 1024                                    1.00      7.0±0.03µs        ? ?/sec    1.00      7.0±0.07µs        ? ?/sec
concat struct with int32 and dicts size=1024 count=2           1.00      6.7±0.02µs        ? ?/sec    1.00      6.7±0.03µs        ? ?/sec
concat utf8_view  max_str_len=128 null_density=0               1.00     77.7±0.43µs        ? ?/sec    1.00     77.8±1.36µs        ? ?/sec
concat utf8_view  max_str_len=128 null_density=0.2             1.00     79.6±0.54µs        ? ?/sec    1.00     79.6±0.57µs        ? ?/sec
concat utf8_view  max_str_len=20 null_density=0                1.00     77.0±0.25µs        ? ?/sec    1.15     88.6±0.89µs        ? ?/sec
concat utf8_view  max_str_len=20 null_density=0.2              1.00     78.9±0.35µs        ? ?/sec    1.15     90.4±0.32µs        ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0      1.01     47.1±3.03µs        ? ?/sec    1.00     46.5±2.92µs        ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0.2    1.00     47.9±2.90µs        ? ?/sec    1.04     49.8±2.91µs        ? ?/sec

@alamb
Copy link
Contributor Author

alamb commented Nov 13, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/set_bits (9ec9c34) to f8d9572 diff
BENCH_NAME=filter_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench filter_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_set_bits
Results will be posted here when complete

@alamb
Copy link
Contributor Author

alamb commented Nov 13, 2025

🤖: Benchmark completed

Details

group                                                                         alamb_set_bits                         main
-----                                                                         --------------                         ----
filter context decimal128 (kept 1/2)                                          1.10     45.5±6.08µs        ? ?/sec    1.00     41.4±3.66µs        ? ?/sec
filter context decimal128 high selectivity (kept 1023/1024)                   1.00     50.7±1.31µs        ? ?/sec    1.00     50.6±1.45µs        ? ?/sec
filter context decimal128 low selectivity (kept 1/1024)                       1.06    251.6±1.69ns        ? ?/sec    1.00    237.9±0.41ns        ? ?/sec
filter context f32 (kept 1/2)                                                 1.00     96.5±0.47µs        ? ?/sec    1.02     98.1±0.23µs        ? ?/sec
filter context f32 high selectivity (kept 1023/1024)                          1.03     10.1±0.51µs        ? ?/sec    1.00      9.9±0.37µs        ? ?/sec
filter context f32 low selectivity (kept 1/1024)                              1.00    458.5±0.80ns        ? ?/sec    1.19    545.2±0.73ns        ? ?/sec
filter context fsb with value length 20 (kept 1/2)                            1.00     79.4±0.20µs        ? ?/sec    1.00     79.6±0.10µs        ? ?/sec
filter context fsb with value length 20 high selectivity (kept 1023/1024)     1.00     79.4±0.14µs        ? ?/sec    1.00     79.5±0.14µs        ? ?/sec
filter context fsb with value length 20 low selectivity (kept 1/1024)         1.00     79.4±0.08µs        ? ?/sec    1.00     79.6±0.14µs        ? ?/sec
filter context fsb with value length 5 (kept 1/2)                             1.00     79.5±0.17µs        ? ?/sec    1.00     79.6±0.59µs        ? ?/sec
filter context fsb with value length 5 high selectivity (kept 1023/1024)      1.00     79.4±0.13µs        ? ?/sec    1.00     79.5±0.15µs        ? ?/sec
filter context fsb with value length 5 low selectivity (kept 1/1024)          1.00     79.4±0.16µs        ? ?/sec    1.00     79.5±0.11µs        ? ?/sec
filter context fsb with value length 50 (kept 1/2)                            1.00     79.4±0.20µs        ? ?/sec    1.00     79.6±0.15µs        ? ?/sec
filter context fsb with value length 50 high selectivity (kept 1023/1024)     1.00     79.5±0.16µs        ? ?/sec    1.00     79.6±0.19µs        ? ?/sec
filter context fsb with value length 50 low selectivity (kept 1/1024)         1.00     79.4±0.16µs        ? ?/sec    1.00     79.6±0.12µs        ? ?/sec
filter context i32 (kept 1/2)                                                 1.00     16.8±0.14µs        ? ?/sec    1.07     17.9±0.04µs        ? ?/sec
filter context i32 high selectivity (kept 1023/1024)                          1.00      6.3±0.33µs        ? ?/sec    1.03      6.5±0.47µs        ? ?/sec
filter context i32 low selectivity (kept 1/1024)                              1.06    255.9±0.43ns        ? ?/sec    1.00    241.8±0.48ns        ? ?/sec
filter context i32 w NULLs (kept 1/2)                                         1.00     96.9±0.25µs        ? ?/sec    1.01     98.2±0.27µs        ? ?/sec
filter context i32 w NULLs high selectivity (kept 1023/1024)                  1.00     10.0±0.38µs        ? ?/sec    1.02     10.2±0.53µs        ? ?/sec
filter context i32 w NULLs low selectivity (kept 1/1024)                      1.26    565.2±3.10ns        ? ?/sec    1.00    447.0±5.77ns        ? ?/sec
filter context mixed string view (kept 1/2)                                   1.00    121.3±4.51µs        ? ?/sec    1.02    123.7±6.98µs        ? ?/sec
filter context mixed string view high selectivity (kept 1023/1024)            1.00     53.7±1.31µs        ? ?/sec    1.00     53.6±1.35µs        ? ?/sec
filter context mixed string view low selectivity (kept 1/1024)                1.00    645.0±0.85ns        ? ?/sec    1.00    642.3±1.02ns        ? ?/sec
filter context short string view (kept 1/2)                                   1.00    122.1±5.29µs        ? ?/sec    1.02    125.0±7.62µs        ? ?/sec
filter context short string view high selectivity (kept 1023/1024)            1.02     54.2±1.54µs        ? ?/sec    1.00     53.3±0.89µs        ? ?/sec
filter context short string view low selectivity (kept 1/1024)                1.00    456.4±1.48ns        ? ?/sec    1.01    461.9±1.06ns        ? ?/sec
filter context string (kept 1/2)                                              1.03   614.3±11.76µs        ? ?/sec    1.00   597.8±10.95µs        ? ?/sec
filter context string dictionary (kept 1/2)                                   1.00     17.3±0.23µs        ? ?/sec    1.10     19.0±0.09µs        ? ?/sec
filter context string dictionary high selectivity (kept 1023/1024)            1.01      7.1±0.25µs        ? ?/sec    1.00      7.0±0.32µs        ? ?/sec
filter context string dictionary low selectivity (kept 1/1024)                1.04    858.9±1.08ns        ? ?/sec    1.00    825.1±1.15ns        ? ?/sec
filter context string dictionary w NULLs (kept 1/2)                           1.00     97.7±0.45µs        ? ?/sec    1.01     98.8±0.35µs        ? ?/sec
filter context string dictionary w NULLs high selectivity (kept 1023/1024)    1.07     11.3±0.32µs        ? ?/sec    1.00     10.6±0.33µs        ? ?/sec
filter context string dictionary w NULLs low selectivity (kept 1/1024)        1.02   1087.6±3.13ns        ? ?/sec    1.00   1061.5±6.92ns        ? ?/sec
filter context string high selectivity (kept 1023/1024)                       1.12   709.6±31.14µs        ? ?/sec    1.00   633.4±14.55µs        ? ?/sec
filter context string low selectivity (kept 1/1024)                           1.00   1098.7±1.63ns        ? ?/sec    1.00   1102.0±1.91ns        ? ?/sec
filter context u8 (kept 1/2)                                                  1.00     22.5±0.09µs        ? ?/sec    1.00     22.5±0.06µs        ? ?/sec
filter context u8 high selectivity (kept 1023/1024)                           1.00   1894.3±9.98ns        ? ?/sec    1.10      2.1±0.01µs        ? ?/sec
filter context u8 low selectivity (kept 1/1024)                               1.06    258.7±1.44ns        ? ?/sec    1.00    244.5±0.38ns        ? ?/sec
filter context u8 w NULLs (kept 1/2)                                          1.00    102.2±0.24µs        ? ?/sec    1.00    102.4±0.16µs        ? ?/sec
filter context u8 w NULLs high selectivity (kept 1023/1024)                   1.03      5.3±0.02µs        ? ?/sec    1.00      5.1±0.02µs        ? ?/sec
filter context u8 w NULLs low selectivity (kept 1/1024)                       1.03    560.4±0.84ns        ? ?/sec    1.00    542.5±0.61ns        ? ?/sec
filter decimal128 (kept 1/2)                                                  1.00     50.3±3.31µs        ? ?/sec    1.07     53.9±4.34µs        ? ?/sec
filter decimal128 high selectivity (kept 1023/1024)                           1.00     53.3±1.85µs        ? ?/sec    1.00     53.4±2.03µs        ? ?/sec
filter decimal128 low selectivity (kept 1/1024)                               1.01      3.0±0.01µs        ? ?/sec    1.00      3.0±0.01µs        ? ?/sec
filter f32 (kept 1/2)                                                         1.00    155.8±1.86µs        ? ?/sec    1.00    155.2±0.19µs        ? ?/sec
filter fsb with value length 20 (kept 1/2)                                    1.16    144.9±1.97µs        ? ?/sec    1.00    124.5±0.34µs        ? ?/sec
filter fsb with value length 20 high selectivity (kept 1023/1024)             1.02     70.9±1.94µs        ? ?/sec    1.00     69.3±1.74µs        ? ?/sec
filter fsb with value length 20 low selectivity (kept 1/1024)                 1.02      3.5±0.01µs        ? ?/sec    1.00      3.4±0.01µs        ? ?/sec
filter fsb with value length 5 (kept 1/2)                                     1.25    150.4±0.44µs        ? ?/sec    1.00    120.6±0.22µs        ? ?/sec
filter fsb with value length 5 high selectivity (kept 1023/1024)              1.00     10.9±0.44µs        ? ?/sec    1.05     11.5±0.58µs        ? ?/sec
filter fsb with value length 5 low selectivity (kept 1/1024)                  1.03      3.4±0.00µs        ? ?/sec    1.00      3.3±0.01µs        ? ?/sec
filter fsb with value length 50 (kept 1/2)                                    1.00    162.0±5.82µs        ? ?/sec    1.01   164.0±12.05µs        ? ?/sec
filter fsb with value length 50 high selectivity (kept 1023/1024)             1.06   221.3±10.65µs        ? ?/sec    1.00    209.2±5.16µs        ? ?/sec
filter fsb with value length 50 low selectivity (kept 1/1024)                 1.02      3.4±0.01µs        ? ?/sec    1.00      3.3±0.01µs        ? ?/sec
filter i32 (kept 1/2)                                                         1.00     45.8±0.28µs        ? ?/sec    1.00     45.6±0.08µs        ? ?/sec
filter i32 high selectivity (kept 1023/1024)                                  1.00      8.7±0.39µs        ? ?/sec    1.02      8.9±0.48µs        ? ?/sec
filter i32 low selectivity (kept 1/1024)                                      1.00      3.0±0.01µs        ? ?/sec    1.01      3.0±0.01µs        ? ?/sec
filter optimize (kept 1/2)                                                    1.00     53.8±0.22µs        ? ?/sec    1.00     53.6±0.11µs        ? ?/sec
filter optimize high selectivity (kept 1023/1024)                             1.00      2.8±0.01µs        ? ?/sec    1.08      3.0±0.01µs        ? ?/sec
filter optimize low selectivity (kept 1/1024)                                 1.00      3.1±0.01µs        ? ?/sec    1.01      3.2±0.01µs        ? ?/sec
filter run array (kept 1/2)                                                   1.00    423.0±0.91µs        ? ?/sec    1.00    424.3±0.78µs        ? ?/sec
filter run array high selectivity (kept 1023/1024)                            1.01    451.4±9.74µs        ? ?/sec    1.00    448.4±3.40µs        ? ?/sec
filter run array low selectivity (kept 1/1024)                                1.00    335.1±0.88µs        ? ?/sec    1.00    334.9±0.84µs        ? ?/sec
filter single record batch                                                    1.00     45.6±0.09µs        ? ?/sec    1.01     46.1±0.07µs        ? ?/sec
filter u8 (kept 1/2)                                                          1.00     45.6±0.08µs        ? ?/sec    1.00     45.7±0.07µs        ? ?/sec
filter u8 high selectivity (kept 1023/1024)                                   1.00      4.0±0.02µs        ? ?/sec    1.00      4.0±0.01µs        ? ?/sec
filter u8 low selectivity (kept 1/1024)                                       1.00      3.1±0.03µs        ? ?/sec    1.00      3.1±0.03µs        ? ?/sec

@alamb
Copy link
Contributor Author

alamb commented Nov 13, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/set_bits (9ec9c34) to f8d9572 diff
BENCH_NAME=boolean_append_packed
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench boolean_append_packed
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_set_bits
Results will be posted here when complete

@alamb
Copy link
Contributor Author

alamb commented Nov 13, 2025

🤖: Benchmark completed

Details

group                    alamb_set_bits                         main
-----                    --------------                         ----
boolean_append_packed    1.00      5.5±0.02µs        ? ?/sec    1.19      6.5±0.02µs        ? ?/sec

@alamb
Copy link
Contributor Author

alamb commented Nov 13, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/set_bits (9ec9c34) to f8d9572 diff
BENCH_NAME=concatenate_kernel
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench concatenate_kernel
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_set_bits
Results will be posted here when complete

@alamb
Copy link
Contributor Author

alamb commented Nov 13, 2025

🤖: Benchmark completed

Details

group                                                          alamb_set_bits                         main
-----                                                          --------------                         ----
concat 1024 arrays boolean 4                                   1.00     21.9±0.07µs        ? ?/sec    1.03     22.4±0.14µs        ? ?/sec
concat 1024 arrays i32 4                                       1.00     14.5±0.14µs        ? ?/sec    1.01     14.7±0.02µs        ? ?/sec
concat 1024 arrays str 4                                       1.00     37.3±0.43µs        ? ?/sec    1.00     37.3±0.37µs        ? ?/sec
concat boolean 1024                                            1.01    338.2±0.31ns        ? ?/sec    1.00    334.1±0.56ns        ? ?/sec
concat boolean 8192 over 100 arrays                            1.00      5.1±0.02µs        ? ?/sec    1.00      5.1±0.02µs        ? ?/sec
concat boolean nulls 1024                                      1.00    535.9±0.89ns        ? ?/sec    1.07    572.0±0.80ns        ? ?/sec
concat boolean nulls 8192 over 100 arrays                      1.00     18.1±0.07µs        ? ?/sec    1.00     18.2±0.05µs        ? ?/sec
concat fixed size lists                                        1.00   763.6±28.98µs        ? ?/sec    1.00   763.8±17.62µs        ? ?/sec
concat i32 1024                                                1.01    389.0±0.69ns        ? ?/sec    1.00    385.6±0.72ns        ? ?/sec
concat i32 8192 over 100 arrays                                1.00    205.0±4.54µs        ? ?/sec    1.05    214.5±6.29µs        ? ?/sec
concat i32 nulls 1024                                          1.00    598.6±5.41ns        ? ?/sec    1.03    617.3±4.62ns        ? ?/sec
concat i32 nulls 8192 over 100 arrays                          1.01   241.4±10.55µs        ? ?/sec    1.00    239.2±7.36µs        ? ?/sec
concat str 1024                                                1.00     13.7±1.28µs        ? ?/sec    1.00     13.6±0.93µs        ? ?/sec
concat str 8192 over 100 arrays                                1.01    105.5±0.92ms        ? ?/sec    1.00    104.9±0.90ms        ? ?/sec
concat str nulls 1024                                          1.03      6.1±0.69µs        ? ?/sec    1.00      6.0±0.65µs        ? ?/sec
concat str nulls 8192 over 100 arrays                          1.00     51.8±0.94ms        ? ?/sec    1.02     53.1±0.49ms        ? ?/sec
concat str_dict 1024                                           1.00      2.8±0.01µs        ? ?/sec    1.05      2.9±0.01µs        ? ?/sec
concat str_dict_sparse 1024                                    1.02      7.0±0.02µs        ? ?/sec    1.00      6.9±0.03µs        ? ?/sec
concat struct with int32 and dicts size=1024 count=2           1.03      7.1±0.03µs        ? ?/sec    1.00      6.8±0.19µs        ? ?/sec
concat utf8_view  max_str_len=128 null_density=0               1.00     77.4±0.38µs        ? ?/sec    1.01     77.8±0.51µs        ? ?/sec
concat utf8_view  max_str_len=128 null_density=0.2             1.00     79.4±0.51µs        ? ?/sec    1.01     79.9±0.36µs        ? ?/sec
concat utf8_view  max_str_len=20 null_density=0                1.00     77.2±0.70µs        ? ?/sec    1.18     90.8±0.38µs        ? ?/sec
concat utf8_view  max_str_len=20 null_density=0.2              1.00     79.3±0.36µs        ? ?/sec    1.17     92.9±0.41µs        ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0      1.00     46.7±2.62µs        ? ?/sec    1.04     48.4±2.86µs        ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0.2    1.00     47.3±3.23µs        ? ?/sec    1.08     51.2±3.03µs        ? ?/sec

@alamb
Copy link
Contributor Author

alamb commented Nov 13, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/set_bits (9ec9c34) to f8d9572 diff
BENCH_NAME=filter_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench filter_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_set_bits
Results will be posted here when complete

@alamb
Copy link
Contributor Author

alamb commented Nov 13, 2025

🤖: Benchmark completed

Details

group                                                                         alamb_set_bits                         main
-----                                                                         --------------                         ----
filter context decimal128 (kept 1/2)                                          1.02     42.3±5.16µs        ? ?/sec    1.00     41.4±1.88µs        ? ?/sec
filter context decimal128 high selectivity (kept 1023/1024)                   1.00     49.9±0.69µs        ? ?/sec    1.02     50.9±1.28µs        ? ?/sec
filter context decimal128 low selectivity (kept 1/1024)                       1.03    251.5±0.38ns        ? ?/sec    1.00    244.0±0.34ns        ? ?/sec
filter context f32 (kept 1/2)                                                 1.00     96.6±0.24µs        ? ?/sec    1.02     98.2±1.01µs        ? ?/sec
filter context f32 high selectivity (kept 1023/1024)                          1.01     10.0±0.44µs        ? ?/sec    1.00      9.9±0.38µs        ? ?/sec
filter context f32 low selectivity (kept 1/1024)                              1.00    451.3±2.41ns        ? ?/sec    1.22    548.8±1.06ns        ? ?/sec
filter context fsb with value length 20 (kept 1/2)                            1.00     79.4±0.17µs        ? ?/sec    1.00     79.5±0.13µs        ? ?/sec
filter context fsb with value length 20 high selectivity (kept 1023/1024)     1.00     79.4±0.15µs        ? ?/sec    1.00     79.5±0.13µs        ? ?/sec
filter context fsb with value length 20 low selectivity (kept 1/1024)         1.00     79.4±0.13µs        ? ?/sec    1.00     79.5±0.15µs        ? ?/sec
filter context fsb with value length 5 (kept 1/2)                             1.00     79.4±0.12µs        ? ?/sec    1.00     79.5±0.12µs        ? ?/sec
filter context fsb with value length 5 high selectivity (kept 1023/1024)      1.00     79.4±0.11µs        ? ?/sec    1.00     79.5±0.11µs        ? ?/sec
filter context fsb with value length 5 low selectivity (kept 1/1024)          1.00     79.4±0.20µs        ? ?/sec    1.00     79.5±0.14µs        ? ?/sec
filter context fsb with value length 50 (kept 1/2)                            1.00     79.5±0.20µs        ? ?/sec    1.00     79.6±0.08µs        ? ?/sec
filter context fsb with value length 50 high selectivity (kept 1023/1024)     1.00     79.4±0.11µs        ? ?/sec    1.00     79.5±0.12µs        ? ?/sec
filter context fsb with value length 50 low selectivity (kept 1/1024)         1.00     79.4±0.10µs        ? ?/sec    1.00     79.6±0.29µs        ? ?/sec
filter context i32 (kept 1/2)                                                 1.00     17.2±0.07µs        ? ?/sec    1.05     18.0±0.11µs        ? ?/sec
filter context i32 high selectivity (kept 1023/1024)                          1.00      6.3±0.24µs        ? ?/sec    1.05      6.6±0.44µs        ? ?/sec
filter context i32 low selectivity (kept 1/1024)                              1.00    250.1±0.58ns        ? ?/sec    1.00    248.9±0.51ns        ? ?/sec
filter context i32 w NULLs (kept 1/2)                                         1.00     96.9±0.24µs        ? ?/sec    1.02     98.3±0.27µs        ? ?/sec
filter context i32 w NULLs high selectivity (kept 1023/1024)                  1.02     10.0±0.45µs        ? ?/sec    1.00      9.8±0.49µs        ? ?/sec
filter context i32 w NULLs low selectivity (kept 1/1024)                      1.01    553.1±0.83ns        ? ?/sec    1.00    550.3±5.13ns        ? ?/sec
filter context mixed string view (kept 1/2)                                   1.04    124.0±5.31µs        ? ?/sec    1.00    119.7±1.83µs        ? ?/sec
filter context mixed string view high selectivity (kept 1023/1024)            1.00     53.4±1.12µs        ? ?/sec    1.00     53.2±1.04µs        ? ?/sec
filter context mixed string view low selectivity (kept 1/1024)                1.00    641.8±1.42ns        ? ?/sec    1.04    665.8±1.29ns        ? ?/sec
filter context short string view (kept 1/2)                                   1.00    122.7±5.02µs        ? ?/sec    1.02    124.8±6.08µs        ? ?/sec
filter context short string view high selectivity (kept 1023/1024)            1.05     55.2±1.52µs        ? ?/sec    1.00     52.8±1.05µs        ? ?/sec
filter context short string view low selectivity (kept 1/1024)                1.00    455.2±1.04ns        ? ?/sec    1.02    464.2±0.84ns        ? ?/sec
filter context string (kept 1/2)                                              1.00    613.9±5.96µs        ? ?/sec    1.00   612.1±13.71µs        ? ?/sec
filter context string dictionary (kept 1/2)                                   1.00     17.5±0.05µs        ? ?/sec    1.06     18.6±0.13µs        ? ?/sec
filter context string dictionary high selectivity (kept 1023/1024)            1.17      8.2±0.41µs        ? ?/sec    1.00      7.0±0.30µs        ? ?/sec
filter context string dictionary low selectivity (kept 1/1024)                1.04   861.7±12.23ns        ? ?/sec    1.00    828.5±4.51ns        ? ?/sec
filter context string dictionary w NULLs (kept 1/2)                           1.00     97.7±0.42µs        ? ?/sec    1.02     99.3±0.23µs        ? ?/sec
filter context string dictionary w NULLs high selectivity (kept 1023/1024)    1.04     11.3±0.28µs        ? ?/sec    1.00     10.8±0.33µs        ? ?/sec
filter context string dictionary w NULLs low selectivity (kept 1/1024)        1.03   1098.7±6.20ns        ? ?/sec    1.00   1063.0±3.40ns        ? ?/sec
filter context string high selectivity (kept 1023/1024)                       1.01   673.8±13.39µs        ? ?/sec    1.00   666.3±18.99µs        ? ?/sec
filter context string low selectivity (kept 1/1024)                           1.00    935.2±1.47ns        ? ?/sec    1.19   1112.5±3.32ns        ? ?/sec
filter context u8 (kept 1/2)                                                  1.00     22.4±0.06µs        ? ?/sec    1.00     22.5±0.07µs        ? ?/sec
filter context u8 high selectivity (kept 1023/1024)                           1.03  1895.7±10.75ns        ? ?/sec    1.00   1834.3±8.81ns        ? ?/sec
filter context u8 low selectivity (kept 1/1024)                               1.01    253.2±0.44ns        ? ?/sec    1.00    250.1±0.54ns        ? ?/sec
filter context u8 w NULLs (kept 1/2)                                          1.00    102.3±0.18µs        ? ?/sec    1.00    102.4±0.25µs        ? ?/sec
filter context u8 w NULLs high selectivity (kept 1023/1024)                   1.02      5.2±0.02µs        ? ?/sec    1.00      5.1±0.03µs        ? ?/sec
filter context u8 w NULLs low selectivity (kept 1/1024)                       1.02    555.4±0.99ns        ? ?/sec    1.00    546.3±0.94ns        ? ?/sec
filter decimal128 (kept 1/2)                                                  1.00     49.9±1.98µs        ? ?/sec    1.07     53.2±4.66µs        ? ?/sec
filter decimal128 high selectivity (kept 1023/1024)                           1.00     52.5±0.45µs        ? ?/sec    1.02     53.4±1.49µs        ? ?/sec
filter decimal128 low selectivity (kept 1/1024)                               1.00      3.0±0.01µs        ? ?/sec    1.05      3.1±0.01µs        ? ?/sec
filter f32 (kept 1/2)                                                         1.00    155.0±0.40µs        ? ?/sec    1.00    155.2±0.24µs        ? ?/sec
filter fsb with value length 20 (kept 1/2)                                    1.15    144.4±0.30µs        ? ?/sec    1.00    125.5±0.36µs        ? ?/sec
filter fsb with value length 20 high selectivity (kept 1023/1024)             1.00     70.3±1.07µs        ? ?/sec    1.02     71.7±1.67µs        ? ?/sec
filter fsb with value length 20 low selectivity (kept 1/1024)                 1.03      3.5±0.02µs        ? ?/sec    1.00      3.4±0.01µs        ? ?/sec
filter fsb with value length 5 (kept 1/2)                                     1.25    150.3±0.39µs        ? ?/sec    1.00    120.6±0.34µs        ? ?/sec
filter fsb with value length 5 high selectivity (kept 1023/1024)              1.02     11.6±0.64µs        ? ?/sec    1.00     11.4±0.59µs        ? ?/sec
filter fsb with value length 5 low selectivity (kept 1/1024)                  1.04      3.4±0.06µs        ? ?/sec    1.00      3.3±0.01µs        ? ?/sec
filter fsb with value length 50 (kept 1/2)                                    1.01    161.9±6.06µs        ? ?/sec    1.00    161.0±5.58µs        ? ?/sec
filter fsb with value length 50 high selectivity (kept 1023/1024)             1.00    202.1±3.61µs        ? ?/sec    1.06    214.4±3.76µs        ? ?/sec
filter fsb with value length 50 low selectivity (kept 1/1024)                 1.03      3.4±0.01µs        ? ?/sec    1.00      3.3±0.01µs        ? ?/sec
filter i32 (kept 1/2)                                                         1.00     45.8±0.13µs        ? ?/sec    1.00     45.7±0.12µs        ? ?/sec
filter i32 high selectivity (kept 1023/1024)                                  1.00      8.7±0.37µs        ? ?/sec    1.02      8.8±0.31µs        ? ?/sec
filter i32 low selectivity (kept 1/1024)                                      1.00      3.0±0.01µs        ? ?/sec    1.01      3.0±0.01µs        ? ?/sec
filter optimize (kept 1/2)                                                    1.00     53.8±0.14µs        ? ?/sec    1.00     53.7±0.35µs        ? ?/sec
filter optimize high selectivity (kept 1023/1024)                             1.00      2.8±0.01µs        ? ?/sec    1.07      3.0±0.01µs        ? ?/sec
filter optimize low selectivity (kept 1/1024)                                 1.00      3.1±0.01µs        ? ?/sec    1.01      3.1±0.01µs        ? ?/sec
filter run array (kept 1/2)                                                   1.00    423.5±0.74µs        ? ?/sec    1.00    424.1±0.69µs        ? ?/sec
filter run array high selectivity (kept 1023/1024)                            1.00    447.7±1.15µs        ? ?/sec    1.00    448.1±1.94µs        ? ?/sec
filter run array low selectivity (kept 1/1024)                                1.00    335.5±3.38µs        ? ?/sec    1.00    335.2±1.16µs        ? ?/sec
filter single record batch                                                    1.00     45.5±0.11µs        ? ?/sec    1.01     46.1±0.45µs        ? ?/sec
filter u8 (kept 1/2)                                                          1.00     45.5±0.07µs        ? ?/sec    1.00     45.7±0.04µs        ? ?/sec
filter u8 high selectivity (kept 1023/1024)                                   1.00      3.9±0.02µs        ? ?/sec    1.03      4.1±0.02µs        ? ?/sec
filter u8 low selectivity (kept 1/1024)                                       1.00      3.1±0.03µs        ? ?/sec    1.00      3.1±0.01µs        ? ?/sec

@kazuyukitanimura
Copy link
Contributor

Thank you. Looking forward to the perf tests. +1 for the rational.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants