perf: add optimized zip implementation for scalars #8653

rluvaton · 2025-10-19T13:51:42Z

Waiting for the PRs below to be merged first:

bench: create zip kernel benchmarks #8654 - zip benchmarks

This PR include the following other PRs (unless merged) to make the review easier, so please make sure to review them first

perf: add repeat_slice_n_times to MutableBuffer #8658 - extracted from this
perf: add optimized function to create offset with same length #8656 - extracted from this

Which issue does this PR close?

N/A

Rationale for this change

Making zip really fast for scalars

This is useful for IF <expr> THEN <literal> ELSE <literal> END

What changes are included in this PR?

Created couple of implementation for zipping scalar, for primitive, bytes and fallback

Are these changes tested?

existing tests

Are there any user-facing changes?

new struct ScalarZipper

TODO:

Need to add comments if missing
Add tests for decimal and timestamp to make sure the type is kept

This is useful for `IF <expr> THEN <scalar> ELSE <scalar> END` TODO: - [ ] Need to add comments if missing - [ ] Add benchmark

rluvaton · 2025-10-19T13:53:02Z

arrow-select/src/zip.rs

+                let scalars: Vec<T::Native> = predicate
+                    .iter()
+                    .map(|b| if b { then_val } else { else_val })
+                    .collect();


This will probably use conditional move

rluvaton · 2025-10-19T13:54:05Z

arrow-select/src/zip.rs

+fn combine_nulls_and_false(predicate: &BooleanArray) -> BooleanBuffer {
+    if let Some(nulls) = predicate.nulls().filter(|n| n.null_count() > 0) {
+        predicate.values().bitand(
+            // nulls are represented as 0 (false) in the values buffer
+            nulls.inner(),
+        )
+    } else {
+        predicate.values().clone()
+    }
+}


I'm pretty sure there is already a helper function in arrow for this

# Which issue does this PR close? N/A # Rationale for this change I have a PR to improve zip perf for scalar but I don't see any benchmarks for it: - #8653 # What changes are included in this PR? created zip benchmarks for scalar and non scalar with different masks # Are these changes tested? N/A # Are there any user-facing changes? Nope

rluvaton · 2025-10-19T20:57:39Z

@alamb If you wanna run the benchmarks for zip, there are no more optimization left for this PR, only cleanups, tests and comments

I saw for scalars major improvements while in array and scalar regression for some reason (maybe the extra check? even though it is a simple comparison. I run it on bare metal to reduce noise as much as possible)

I tests it on:

$ neofetch
            .-/+oossssoo+/-.               ubuntu@ip-
        `:+ssssssssssssssssss+:`           -----------------------
      -+ssssssssssssssssssyyssss+-         OS: Ubuntu 24.04.3 LTS x86_64
    .ossssssssssssssssssdMMMNysssso.       Host: c5.metal 1.0
   /ssssssssssshdmmNNmmyNMMMMhssssss/      Kernel: 6.14.0-1011-aws
  +ssssssssshmydMMMMMMMNddddyssssssss+     Uptime: 3 hours, 46 mins
 /sssssssshNMMMyhhyyyyhmNMMMNhssssssss/    Packages: 921 (dpkg), 5 (snap)
.ssssssssdMMMNhsssssssssshNMMMdssssssss.   Shell: bash 5.2.21
+sssshhhyNMMNyssssssssssssyNMMMysssssss+   Terminal: /dev/pts/0
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   CPU: Intel Xeon Platinum 8275CL (96) @ 3.900GHz
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   Memory: 2144MiB / 193025MiB
+sssshhhyNMMNyssssssssssssyNMMMysssssss+
.ssssssssdMMMNhsssssssssshNMMMdssssssss.
 /sssssssshNMMMyhhyyyyhdNMMMNhssssssss/
  +sssssssssdmydMMMMMMMMddddyssssssss+
   /ssssssssssshdmNNNNmyNMMMMhssssss/
    .ossssssssssssssssssdMMMNysssso.
      -+sssssssssssssssssyyyssss+-
        `:+ssssssssssssssssss+:`
            .-/+oossssoo+/-.

arrow-buffer/src/buffer/mutable.rs

arrow-select/src/zip.rs

arrow-buffer/src/buffer/mutable.rs

arrow-select/src/zip.rs

arrow-buffer/src/buffer/mutable.rs

this will be used in: - apache#8653

# Conflicts: # arrow-buffer/src/buffer/mutable.rs

…e-zip-for-scalars

alamb · 2025-10-20T17:59:37Z

Thank you @rluvaton -- I have scheduled benchmarks for this PR and reviewed the dependent ones. Exciting stuff

alamb · 2025-10-20T18:11:23Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing improve-zip-for-scalars (dabbf55) to 9d75f87 diff
BENCH_NAME=zip_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench zip_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=improve-zip-for-scalars
Results will be posted here when complete

alamb · 2025-10-20T19:09:35Z

The zip benchmarks are still running... Maybe we should trim them back a bit

alamb · 2025-10-20T19:32:55Z

🤖: Benchmark completed

Details

group                                                                               improve-zip-for-scalars                main
-----                                                                               -----------------------                ----
zip_8192_from_i32/array_vs_array/10pct_true                                         1.02     35.5±0.18µs        ? ?/sec    1.00     34.7±0.05µs        ? ?/sec
zip_8192_from_i32/array_vs_array/1pct_true                                          1.00      5.1±0.01µs        ? ?/sec    1.00      5.1±0.02µs        ? ?/sec
zip_8192_from_i32/array_vs_array/50pct_nulls                                        1.02     75.3±0.17µs        ? ?/sec    1.00     74.0±0.16µs        ? ?/sec
zip_8192_from_i32/array_vs_array/50pct_true                                         1.01    102.6±0.22µs        ? ?/sec    1.00    101.7±0.17µs        ? ?/sec
zip_8192_from_i32/array_vs_array/90pct_true                                         1.01     36.5±0.07µs        ? ?/sec    1.00     36.1±0.08µs        ? ?/sec
zip_8192_from_i32/array_vs_array/99pct_true                                         1.00      5.9±0.03µs        ? ?/sec    1.01      6.0±0.02µs        ? ?/sec
zip_8192_from_i32/array_vs_array/all_false                                          1.01      2.5±0.09µs        ? ?/sec    1.00      2.5±0.11µs        ? ?/sec
zip_8192_from_i32/array_vs_array/all_true                                           1.03      2.5±0.11µs        ? ?/sec    1.00      2.5±0.09µs        ? ?/sec
zip_8192_from_i32/array_vs_non_null_scalar/10pct_true                               1.01     32.5±0.10ns        ? ?/sec    1.00     32.2±0.15ns        ? ?/sec
zip_8192_from_i32/array_vs_non_null_scalar/1pct_true                                1.05     33.8±0.09ns        ? ?/sec    1.00     32.2±0.17ns        ? ?/sec
zip_8192_from_i32/array_vs_non_null_scalar/50pct_nulls                              1.05     33.8±0.10ns        ? ?/sec    1.00     32.2±0.22ns        ? ?/sec
zip_8192_from_i32/array_vs_non_null_scalar/50pct_true                               1.01     32.6±0.08ns        ? ?/sec    1.00     32.2±0.07ns        ? ?/sec
zip_8192_from_i32/array_vs_non_null_scalar/90pct_true                               1.03     33.1±0.10ns        ? ?/sec    1.00     32.2±0.26ns        ? ?/sec
zip_8192_from_i32/array_vs_non_null_scalar/99pct_true                               1.01     32.6±0.17ns        ? ?/sec    1.00     32.2±0.09ns        ? ?/sec
zip_8192_from_i32/array_vs_non_null_scalar/all_false                                1.01     32.5±0.06ns        ? ?/sec    1.00     32.2±0.16ns        ? ?/sec
zip_8192_from_i32/array_vs_non_null_scalar/all_true                                 1.01     32.5±0.05ns        ? ?/sec    1.00     32.2±0.05ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_array/10pct_true                               1.01     32.5±0.07ns        ? ?/sec    1.00     32.2±0.09ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_array/1pct_true                                1.05     33.8±0.07ns        ? ?/sec    1.00     32.2±0.12ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_array/50pct_nulls                              1.05     33.8±0.43ns        ? ?/sec    1.00     32.2±0.21ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_array/50pct_true                               1.01     32.6±0.05ns        ? ?/sec    1.00     32.1±0.08ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_array/90pct_true                               1.03     33.1±0.07ns        ? ?/sec    1.00     32.2±0.14ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_array/99pct_true                               1.01     32.6±0.08ns        ? ?/sec    1.00     32.2±0.08ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_array/all_false                                1.01     32.6±0.07ns        ? ?/sec    1.00     32.2±0.09ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_array/all_true                                 1.01     32.6±0.10ns        ? ?/sec    1.00     32.2±0.08ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_null_scalar/10pct_true                         1.00  1185.4±15.73ns        ? ?/sec    116.20   137.7±0.26µs        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_null_scalar/1pct_true                          1.00   1187.9±3.93ns        ? ?/sec    111.72   132.7±0.21µs        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_null_scalar/50pct_nulls                        1.00   1277.8±9.95ns        ? ?/sec    112.95   144.3±0.29µs        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_null_scalar/50pct_true                         1.00  1163.2±16.93ns        ? ?/sec    130.07   151.3±0.43µs        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_null_scalar/90pct_true                         1.00  1150.1±12.10ns        ? ?/sec    89.00   102.4±0.43µs        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_null_scalar/99pct_true                         1.00   1185.2±4.06ns        ? ?/sec    74.24    88.0±0.43µs        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_null_scalar/all_false                          1.00   1165.8±4.12ns        ? ?/sec    115.27   134.4±5.33µs        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_null_scalar/all_true                           1.00   1168.4±9.30ns        ? ?/sec    73.62    86.0±0.19µs        ? ?/sec
zip_8192_from_i32/non_nulls_scalars/10pct_true                                      1.00      9.1±0.01µs        ? ?/sec    7.43     67.7±0.37µs        ? ?/sec
zip_8192_from_i32/non_nulls_scalars/1pct_true                                       1.00      9.0±0.05µs        ? ?/sec    6.54     59.1±0.18µs        ? ?/sec
zip_8192_from_i32/non_nulls_scalars/50pct_nulls                                     1.00      9.1±0.03µs        ? ?/sec    8.83     80.6±0.13µs        ? ?/sec
zip_8192_from_i32/non_nulls_scalars/50pct_true                                      1.00      9.1±0.01µs        ? ?/sec    10.68    96.8±0.32µs        ? ?/sec
zip_8192_from_i32/non_nulls_scalars/90pct_true                                      1.00      9.1±0.07µs        ? ?/sec    7.61     69.6±0.15µs        ? ?/sec
zip_8192_from_i32/non_nulls_scalars/99pct_true                                      1.00      9.1±0.07µs        ? ?/sec    6.55     59.8±0.13µs        ? ?/sec
zip_8192_from_i32/non_nulls_scalars/all_false                                       1.00      9.1±0.03µs        ? ?/sec    6.57     59.9±0.33µs        ? ?/sec
zip_8192_from_i32/non_nulls_scalars/all_true                                        1.00      9.1±0.07µs        ? ?/sec    6.40     58.1±0.75µs        ? ?/sec
zip_8192_from_i32/null_vs_non_null_scalar/10pct_true                                1.00   1190.3±4.64ns        ? ?/sec    84.69   100.8±0.14µs        ? ?/sec
zip_8192_from_i32/null_vs_non_null_scalar/1pct_true                                 1.00   1350.6±5.18ns        ? ?/sec    64.97    87.7±0.16µs        ? ?/sec
zip_8192_from_i32/null_vs_non_null_scalar/50pct_nulls                               1.00  1425.8±13.40ns        ? ?/sec    86.43   123.2±0.21µs        ? ?/sec
zip_8192_from_i32/null_vs_non_null_scalar/50pct_true                                1.00  1269.4±13.18ns        ? ?/sec    119.66   151.9±0.30µs        ? ?/sec
zip_8192_from_i32/null_vs_non_null_scalar/90pct_true                                1.00   1339.9±7.01ns        ? ?/sec    104.26   139.7±0.16µs        ? ?/sec
zip_8192_from_i32/null_vs_non_null_scalar/99pct_true                                1.00   1277.2±3.57ns        ? ?/sec    104.62   133.6±0.26µs        ? ?/sec
zip_8192_from_i32/null_vs_non_null_scalar/all_false                                 1.00   1331.3±7.25ns        ? ?/sec    64.88    86.4±0.19µs        ? ?/sec
zip_8192_from_i32/null_vs_non_null_scalar/all_true                                  1.00   1243.0±2.26ns        ? ?/sec    106.56   132.5±0.35µs        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_array/10pct_true                       1.00   319.4±10.44µs        ? ?/sec    1.04   332.7±12.15µs        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_array/1pct_true                        1.00    288.4±4.29µs        ? ?/sec    1.00    288.1±5.35µs        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_array/50pct_nulls                      1.00   388.7±14.23µs        ? ?/sec    1.00    386.9±4.64µs        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_array/50pct_true                       1.00    426.8±9.92µs        ? ?/sec    1.00   426.6±11.04µs        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_array/90pct_true                       1.00    327.8±8.85µs        ? ?/sec    1.01    331.4±5.05µs        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_array/99pct_true                       1.04    279.8±6.41µs        ? ?/sec    1.00    269.2±4.80µs        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_array/all_false                        1.05    118.5±4.92µs        ? ?/sec    1.00    112.7±5.53µs        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_array/all_true                         1.00    117.6±2.97µs        ? ?/sec    1.00    117.8±4.67µs        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_non_null_scalar/10pct_true             1.01     32.8±0.24ns        ? ?/sec    1.00     32.6±0.04ns        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_non_null_scalar/1pct_true              1.01     32.8±0.04ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_non_null_scalar/50pct_nulls            1.01     32.8±0.05ns        ? ?/sec    1.00     32.5±0.34ns        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_non_null_scalar/50pct_true             1.01     32.8±0.27ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_non_null_scalar/90pct_true             1.01     32.8±0.28ns        ? ?/sec    1.00     32.5±0.05ns        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_non_null_scalar/99pct_true             1.00     32.8±0.06ns        ? ?/sec    1.00     32.6±0.07ns        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_non_null_scalar/all_false              1.00     32.8±0.05ns        ? ?/sec    1.00     32.7±0.38ns        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_non_null_scalar/all_true               1.01     32.8±0.15ns        ? ?/sec    1.00     32.6±0.09ns        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_array/10pct_true             1.00     32.8±0.04ns        ? ?/sec    1.00     32.7±0.31ns        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_array/1pct_true              1.01     32.8±0.06ns        ? ?/sec    1.00     32.6±0.15ns        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_array/50pct_nulls            1.01     32.8±0.05ns        ? ?/sec    1.00     32.5±0.19ns        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_array/50pct_true             1.01     32.8±0.05ns        ? ?/sec    1.00     32.6±0.10ns        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_array/90pct_true             1.01     32.8±0.09ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_array/99pct_true             1.01     32.8±0.07ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_array/all_false              1.00     32.8±0.05ns        ? ?/sec    1.00     32.6±0.19ns        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_array/all_true               1.01     32.8±0.07ns        ? ?/sec    1.00     32.6±0.10ns        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_null_scalar/10pct_true       1.00     20.7±0.54µs        ? ?/sec    9.96    206.0±1.98µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_null_scalar/1pct_true        1.00     10.9±0.11µs        ? ?/sec    17.43   189.2±0.43µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_null_scalar/50pct_nulls      1.00     38.8±1.07µs        ? ?/sec    5.98    231.9±0.91µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_null_scalar/50pct_true       1.00     66.4±1.40µs        ? ?/sec    4.58    303.7±2.12µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_null_scalar/90pct_true       1.00     83.5±1.32µs        ? ?/sec    4.64    387.8±4.60µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_null_scalar/99pct_true       1.00     88.4±1.11µs        ? ?/sec    4.32    382.2±4.20µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_null_scalar/all_false        1.00  968.9±108.55ns        ? ?/sec    194.29   188.3±0.41µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_null_scalar/all_true         1.00     89.4±1.74µs        ? ?/sec    4.27    381.5±5.05µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_nulls_scalars/10pct_true                    1.00     75.7±1.62µs        ? ?/sec    3.33    252.0±1.89µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_nulls_scalars/1pct_true                     1.00     57.5±0.55µs        ? ?/sec    4.31    247.9±2.23µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_nulls_scalars/50pct_nulls                   1.00     91.1±0.71µs        ? ?/sec    3.09    281.7±1.28µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_nulls_scalars/50pct_true                    1.00    103.3±1.11µs        ? ?/sec    3.91    404.1±2.26µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_nulls_scalars/90pct_true                    1.00     99.4±1.60µs        ? ?/sec    3.61    358.6±4.68µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_nulls_scalars/99pct_true                    1.00     87.7±1.20µs        ? ?/sec    4.03    353.3±2.34µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_nulls_scalars/all_false                     1.00     45.1±0.69µs        ? ?/sec    5.46    246.1±2.47µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_nulls_scalars/all_true                      1.00     79.1±0.58µs        ? ?/sec    4.47    353.2±2.26µs        ? ?/sec
zip_8192_from_long bytes (100..400)/null_vs_non_null_scalar/10pct_true              1.00     83.8±1.22µs        ? ?/sec    4.60    385.3±4.00µs        ? ?/sec
zip_8192_from_long bytes (100..400)/null_vs_non_null_scalar/1pct_true               1.00     88.2±0.74µs        ? ?/sec    4.35    383.1±4.88µs        ? ?/sec
zip_8192_from_long bytes (100..400)/null_vs_non_null_scalar/50pct_nulls             1.00     74.8±1.70µs        ? ?/sec    5.21    389.8±2.27µs        ? ?/sec
zip_8192_from_long bytes (100..400)/null_vs_non_null_scalar/50pct_true              1.00     66.2±1.09µs        ? ?/sec    4.61    305.4±2.39µs        ? ?/sec
zip_8192_from_long bytes (100..400)/null_vs_non_null_scalar/90pct_true              1.00     21.1±0.52µs        ? ?/sec    9.81    207.1±2.43µs        ? ?/sec
zip_8192_from_long bytes (100..400)/null_vs_non_null_scalar/99pct_true              1.00     11.4±0.04µs        ? ?/sec    16.75   190.5±0.31µs        ? ?/sec
zip_8192_from_long bytes (100..400)/null_vs_non_null_scalar/all_false               1.00     89.1±0.83µs        ? ?/sec    4.26    379.9±3.87µs        ? ?/sec
zip_8192_from_long bytes (100..400)/null_vs_non_null_scalar/all_true                1.00  1143.5±51.10ns        ? ?/sec    163.04   186.4±0.89µs        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_array/10pct_true                     1.06   333.4±10.41µs        ? ?/sec    1.00    314.0±4.35µs        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_array/1pct_true                      1.04    301.5±6.53µs        ? ?/sec    1.00    290.5±5.29µs        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_array/50pct_nulls                    1.02    390.4±7.82µs        ? ?/sec    1.00    383.4±8.58µs        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_array/50pct_true                     1.03   433.6±15.06µs        ? ?/sec    1.00    421.5±4.80µs        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_array/90pct_true                     1.06    339.4±9.25µs        ? ?/sec    1.00    321.4±6.00µs        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_array/99pct_true                     1.03    265.6±6.99µs        ? ?/sec    1.00    259.1±2.96µs        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_array/all_false                      1.07    117.0±4.54µs        ? ?/sec    1.00    109.6±2.65µs        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_array/all_true                       1.00    116.0±4.46µs        ? ?/sec    1.01    117.0±5.47µs        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_non_null_scalar/10pct_true           1.00     32.7±0.05ns        ? ?/sec    1.00     32.6±0.08ns        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_non_null_scalar/1pct_true            1.00     32.8±0.06ns        ? ?/sec    1.00     32.7±0.27ns        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_non_null_scalar/50pct_nulls          1.01     32.8±0.13ns        ? ?/sec    1.00     32.6±0.08ns        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_non_null_scalar/50pct_true           1.00     32.7±0.06ns        ? ?/sec    1.00     32.6±0.10ns        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_non_null_scalar/90pct_true           1.00     32.8±0.05ns        ? ?/sec    1.00     32.6±0.05ns        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_non_null_scalar/99pct_true           1.00     32.7±0.04ns        ? ?/sec    1.00     32.6±0.08ns        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_non_null_scalar/all_false            1.01     32.7±0.05ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_non_null_scalar/all_true             1.00     32.8±0.06ns        ? ?/sec    1.00     32.6±0.05ns        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_array/10pct_true           1.01     32.8±0.14ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_array/1pct_true            1.01     32.8±0.16ns        ? ?/sec    1.00     32.6±0.08ns        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_array/50pct_nulls          1.01     32.8±0.14ns        ? ?/sec    1.00     32.6±0.05ns        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_array/50pct_true           1.01     32.8±0.16ns        ? ?/sec    1.00     32.6±0.07ns        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_array/90pct_true           1.01     32.8±0.14ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_array/99pct_true           1.01     32.8±0.06ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_array/all_false            1.01     32.8±0.13ns        ? ?/sec    1.00     32.6±0.05ns        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_array/all_true             1.01     32.8±0.08ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_null_scalar/10pct_true     1.00     20.3±0.68µs        ? ?/sec    10.14   205.9±0.76µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_null_scalar/1pct_true      1.00     10.8±0.04µs        ? ?/sec    17.48   189.2±0.42µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_null_scalar/50pct_nulls    1.00     38.4±1.00µs        ? ?/sec    6.06    232.7±0.70µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_null_scalar/50pct_true     1.00     66.1±1.11µs        ? ?/sec    4.56    301.4±1.40µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_null_scalar/90pct_true     1.00     79.3±0.73µs        ? ?/sec    4.78    378.7±1.96µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_null_scalar/99pct_true     1.00     85.1±0.52µs        ? ?/sec    4.33    368.0±2.16µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_null_scalar/all_false      1.00   964.8±82.65ns        ? ?/sec    194.95   188.1±0.37µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_null_scalar/all_true       1.00     87.3±0.78µs        ? ?/sec    4.22    368.4±1.56µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_nulls_scalars/10pct_true                  1.00    126.7±3.44µs        ? ?/sec    3.22    408.3±3.80µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_nulls_scalars/1pct_true                   1.00    145.2±2.32µs        ? ?/sec    2.80    406.0±3.63µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_nulls_scalars/50pct_nulls                 1.00    126.7±2.43µs        ? ?/sec    3.24    411.1±3.04µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_nulls_scalars/50pct_true                  1.00    125.8±2.96µs        ? ?/sec    3.34    420.3±3.56µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_nulls_scalars/90pct_true                  1.00    104.6±2.45µs        ? ?/sec    3.32    347.5±1.90µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_nulls_scalars/99pct_true                  1.00     89.3±1.55µs        ? ?/sec    3.78    337.8±2.64µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_nulls_scalars/all_false                   1.00    139.0±1.49µs        ? ?/sec    2.91    403.8±3.31µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_nulls_scalars/all_true                    1.00     77.5±1.36µs        ? ?/sec    4.32    335.3±2.33µs        ? ?/sec
zip_8192_from_long strings (100..400)/null_vs_non_null_scalar/10pct_true            1.00     81.0±1.02µs        ? ?/sec    4.63    374.9±1.73µs        ? ?/sec
zip_8192_from_long strings (100..400)/null_vs_non_null_scalar/1pct_true             1.00     86.4±0.63µs        ? ?/sec    4.31    372.1±5.53µs        ? ?/sec
zip_8192_from_long strings (100..400)/null_vs_non_null_scalar/50pct_nulls           1.00     73.8±1.47µs        ? ?/sec    5.22    385.7±1.84µs        ? ?/sec
zip_8192_from_long strings (100..400)/null_vs_non_null_scalar/50pct_true            1.00     66.0±1.08µs        ? ?/sec    4.59    302.7±1.36µs        ? ?/sec
zip_8192_from_long strings (100..400)/null_vs_non_null_scalar/90pct_true            1.00     20.9±0.59µs        ? ?/sec    9.90    206.6±0.61µs        ? ?/sec
zip_8192_from_long strings (100..400)/null_vs_non_null_scalar/99pct_true            1.00     11.3±0.05µs        ? ?/sec    16.79   190.0±0.33µs        ? ?/sec
zip_8192_from_long strings (100..400)/null_vs_non_null_scalar/all_false             1.00     87.2±0.87µs        ? ?/sec    4.22    368.3±2.04µs        ? ?/sec
zip_8192_from_long strings (100..400)/null_vs_non_null_scalar/all_true              1.00  1179.7±53.33ns        ? ?/sec    157.91   186.3±0.22µs        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_array/10pct_true                         1.00     63.1±0.17µs        ? ?/sec    1.00     63.4±0.32µs        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_array/1pct_true                          1.00     22.4±0.11µs        ? ?/sec    1.00     22.4±0.15µs        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_array/50pct_nulls                        1.00    120.9±0.27µs        ? ?/sec    1.00    121.0±0.28µs        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_array/50pct_true                         1.00    158.6±0.51µs        ? ?/sec    1.00    159.0±0.34µs        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_array/90pct_true                         1.00     64.7±0.47µs        ? ?/sec    1.01     65.4±0.31µs        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_array/99pct_true                         1.00     23.2±0.20µs        ? ?/sec    1.00     23.3±0.16µs        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_array/all_false                          1.00     17.9±0.16µs        ? ?/sec    1.00     17.9±0.13µs        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_array/all_true                           1.00     17.7±0.19µs        ? ?/sec    1.01     17.9±0.16µs        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_non_null_scalar/10pct_true               1.01     32.8±0.18ns        ? ?/sec    1.00     32.6±0.07ns        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_non_null_scalar/1pct_true                1.01     32.8±0.08ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_non_null_scalar/50pct_nulls              1.01     32.8±0.06ns        ? ?/sec    1.00     32.6±0.09ns        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_non_null_scalar/50pct_true               1.01     32.8±0.05ns        ? ?/sec    1.00     32.6±0.07ns        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_non_null_scalar/90pct_true               1.00     32.7±0.05ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_non_null_scalar/99pct_true               1.00     32.8±0.05ns        ? ?/sec    1.00     32.6±0.10ns        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_non_null_scalar/all_false                1.01     32.8±0.08ns        ? ?/sec    1.00     32.6±0.07ns        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_non_null_scalar/all_true                 1.01     32.7±0.05ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_array/10pct_true               1.00     32.8±0.07ns        ? ?/sec    1.00     32.6±0.08ns        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_array/1pct_true                1.00     32.7±0.05ns        ? ?/sec    1.00     32.7±0.19ns        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_array/50pct_nulls              1.00     32.8±0.07ns        ? ?/sec    1.00     32.6±0.08ns        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_array/50pct_true               1.04     34.0±0.05ns        ? ?/sec    1.00     32.6±0.11ns        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_array/90pct_true               1.01     32.8±0.18ns        ? ?/sec    1.00     32.6±0.08ns        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_array/99pct_true               1.01     32.8±0.07ns        ? ?/sec    1.00     32.6±0.10ns        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_array/all_false                1.01     32.8±0.26ns        ? ?/sec    1.00     32.6±0.18ns        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_array/all_true                 1.00     32.8±0.04ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_null_scalar/10pct_true         1.00     15.7±0.05µs        ? ?/sec    12.42   194.7±0.46µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_null_scalar/1pct_true          1.00     10.4±0.02µs        ? ?/sec    18.16   188.5±0.33µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_null_scalar/50pct_nulls        1.00     26.1±0.30µs        ? ?/sec    7.73    201.9±0.58µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_null_scalar/50pct_true         1.00     38.5±0.26µs        ? ?/sec    5.47    210.6±0.50µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_null_scalar/90pct_true         1.00     18.6±0.09µs        ? ?/sec    9.06    168.7±0.26µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_null_scalar/99pct_true         1.00     13.3±0.09µs        ? ?/sec    11.65   155.1±0.38µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_null_scalar/all_false          1.00   912.4±65.08ns        ? ?/sec    205.07   187.1±0.35µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_null_scalar/all_true           1.00     12.7±0.10µs        ? ?/sec    11.98   152.3±0.30µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_nulls_scalars/10pct_true                      1.00     34.4±0.06µs        ? ?/sec    3.81    131.0±0.28µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_nulls_scalars/1pct_true                       1.00     15.1±0.03µs        ? ?/sec    8.16    123.1±0.33µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_nulls_scalars/50pct_nulls                     1.00     57.9±0.24µs        ? ?/sec    2.48    143.5±0.48µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_nulls_scalars/50pct_true                      1.00     68.9±0.16µs        ? ?/sec    2.36    163.0±0.31µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_nulls_scalars/90pct_true                      1.00     33.2±0.05µs        ? ?/sec    4.00    132.8±0.23µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_nulls_scalars/99pct_true                      1.00     16.0±0.03µs        ? ?/sec    7.64    122.3±0.34µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_nulls_scalars/all_false                       1.00      2.6±0.03µs        ? ?/sec    47.25   122.4±0.40µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_nulls_scalars/all_true                        1.00      3.1±0.08µs        ? ?/sec    38.23   119.7±0.31µs        ? ?/sec
zip_8192_from_short bytes (3..10)/null_vs_non_null_scalar/10pct_true                1.00     18.3±0.09µs        ? ?/sec    9.11    166.9±0.32µs        ? ?/sec
zip_8192_from_short bytes (3..10)/null_vs_non_null_scalar/1pct_true                 1.00     13.3±0.10µs        ? ?/sec    11.61   154.7±0.35µs        ? ?/sec
zip_8192_from_short bytes (3..10)/null_vs_non_null_scalar/50pct_nulls               1.00     27.5±0.35µs        ? ?/sec    6.78    186.2±2.63µs        ? ?/sec
zip_8192_from_short bytes (3..10)/null_vs_non_null_scalar/50pct_true                1.00     38.6±0.27µs        ? ?/sec    5.45    210.3±0.39µs        ? ?/sec
zip_8192_from_short bytes (3..10)/null_vs_non_null_scalar/90pct_true                1.00     16.2±0.03µs        ? ?/sec    12.00   194.9±0.41µs        ? ?/sec
zip_8192_from_short bytes (3..10)/null_vs_non_null_scalar/99pct_true                1.00     10.8±0.04µs        ? ?/sec    17.43   187.9±0.51µs        ? ?/sec
zip_8192_from_short bytes (3..10)/null_vs_non_null_scalar/all_false                 1.00     12.8±0.11µs        ? ?/sec    11.90   152.9±0.42µs        ? ?/sec
zip_8192_from_short bytes (3..10)/null_vs_non_null_scalar/all_true                  1.00  1249.4±82.65ns        ? ?/sec    149.12   186.3±0.49µs        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_array/10pct_true                       1.00     62.4±0.34µs        ? ?/sec    1.01     62.8±0.19µs        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_array/1pct_true                        1.01     22.4±0.14µs        ? ?/sec    1.00     22.2±0.18µs        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_array/50pct_nulls                      1.00    120.9±0.34µs        ? ?/sec    1.00    121.2±0.36µs        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_array/50pct_true                       1.00    158.5±0.60µs        ? ?/sec    1.00    159.2±0.34µs        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_array/90pct_true                       1.00     64.4±0.47µs        ? ?/sec    1.01     64.7±0.38µs        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_array/99pct_true                       1.00     23.1±0.17µs        ? ?/sec    1.00     23.1±0.12µs        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_array/all_false                        1.00     17.7±0.13µs        ? ?/sec    1.01     17.9±0.20µs        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_array/all_true                         1.00     17.8±0.18µs        ? ?/sec    1.00     17.8±0.14µs        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_non_null_scalar/10pct_true             1.04     34.1±0.14ns        ? ?/sec    1.00     32.6±0.09ns        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_non_null_scalar/1pct_true              1.01     32.8±0.10ns        ? ?/sec    1.00     32.6±0.07ns        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_non_null_scalar/50pct_nulls            1.04     34.0±0.06ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_non_null_scalar/50pct_true             1.00     32.7±0.07ns        ? ?/sec    1.00     32.8±0.56ns        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_non_null_scalar/90pct_true             1.01     32.8±0.04ns        ? ?/sec    1.00     32.6±0.05ns        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_non_null_scalar/99pct_true             1.04     34.0±0.04ns        ? ?/sec    1.00     32.6±0.08ns        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_non_null_scalar/all_false              1.04     34.1±0.07ns        ? ?/sec    1.00     32.6±0.05ns        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_non_null_scalar/all_true               1.00     32.8±0.06ns        ? ?/sec    1.00     32.6±0.08ns        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_array/10pct_true             1.00     32.8±0.07ns        ? ?/sec    1.00     32.6±0.07ns        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_array/1pct_true              1.04     34.0±0.05ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_array/50pct_nulls            1.04     34.0±0.12ns        ? ?/sec    1.00     32.6±0.07ns        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_array/50pct_true             1.04     34.1±0.06ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_array/90pct_true             1.01     32.8±0.10ns        ? ?/sec    1.00     32.6±0.09ns        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_array/99pct_true             1.04     34.1±0.20ns        ? ?/sec    1.00     32.7±0.11ns        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_array/all_false              1.00     32.8±0.06ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_array/all_true               1.04     34.0±0.05ns        ? ?/sec    1.00     32.6±0.09ns        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_null_scalar/10pct_true       1.00     15.7±0.03µs        ? ?/sec    12.39   194.7±1.10µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_null_scalar/1pct_true        1.00     10.4±0.02µs        ? ?/sec    18.07   188.1±1.14µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_null_scalar/50pct_nulls      1.00     26.1±0.28µs        ? ?/sec    7.73    201.6±1.15µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_null_scalar/50pct_true       1.00     38.4±0.24µs        ? ?/sec    5.48    210.5±1.38µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_null_scalar/90pct_true       1.00     18.7±0.09µs        ? ?/sec    9.01    168.5±1.24µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_null_scalar/99pct_true       1.00     13.3±0.09µs        ? ?/sec    11.59   154.7±0.29µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_null_scalar/all_false        1.00   902.9±71.95ns        ? ?/sec    208.31   188.1±1.01µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_null_scalar/all_true         1.00     12.7±0.08µs        ? ?/sec    11.99   151.8±0.44µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_nulls_scalars/10pct_true                    1.00     35.2±0.08µs        ? ?/sec    3.65    128.6±0.48µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_nulls_scalars/1pct_true                     1.00     15.3±0.02µs        ? ?/sec    7.85    120.5±0.53µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_nulls_scalars/50pct_nulls                   1.00     57.9±0.13µs        ? ?/sec    2.44    141.5±0.73µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_nulls_scalars/50pct_true                    1.00     68.5±0.11µs        ? ?/sec    2.32    159.3±0.24µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_nulls_scalars/90pct_true                    1.00     33.4±0.07µs        ? ?/sec    3.92    130.7±0.36µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_nulls_scalars/99pct_true                    1.00     16.1±0.05µs        ? ?/sec    7.51    121.0±0.30µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_nulls_scalars/all_false                     1.00      2.8±0.04µs        ? ?/sec    41.47   117.7±0.28µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_nulls_scalars/all_true                      1.00      3.1±0.04µs        ? ?/sec    38.75   118.6±0.35µs        ? ?/sec
zip_8192_from_short strings (3..10)/null_vs_non_null_scalar/10pct_true              1.00     18.6±0.19µs        ? ?/sec    8.93    166.2±0.42µs        ? ?/sec
zip_8192_from_short strings (3..10)/null_vs_non_null_scalar/1pct_true               1.00     13.3±0.08µs        ? ?/sec    11.58   154.1±0.26µs        ? ?/sec
zip_8192_from_short strings (3..10)/null_vs_non_null_scalar/50pct_nulls             1.00     27.3±0.37µs        ? ?/sec    6.77    185.2±1.16µs        ? ?/sec
zip_8192_from_short strings (3..10)/null_vs_non_null_scalar/50pct_true              1.00     38.6±0.24µs        ? ?/sec    5.44    210.0±0.44µs        ? ?/sec
zip_8192_from_short strings (3..10)/null_vs_non_null_scalar/90pct_true              1.00     16.2±0.03µs        ? ?/sec    12.06   195.2±0.44µs        ? ?/sec
zip_8192_from_short strings (3..10)/null_vs_non_null_scalar/99pct_true              1.00     10.7±0.04µs        ? ?/sec    17.62   188.7±0.57µs        ? ?/sec
zip_8192_from_short strings (3..10)/null_vs_non_null_scalar/all_false               1.00     12.8±0.10µs        ? ?/sec    11.86   151.8±0.92µs        ? ?/sec
zip_8192_from_short strings (3..10)/null_vs_non_null_scalar/all_true                1.00  1026.7±65.52ns        ? ?/sec    181.01   185.9±0.50µs        ? ?/sec

rluvaton · 2025-10-20T20:06:35Z

Good, it looks like we have massive speedups

# Which issue does this PR close? N/A # Rationale for this change doing `OffsetBuffer::from_lengths(std::iter::repeat_n(size, value.len()));` does not utilize SIMD (I explain further if you want) See [GodBolt Link](https://godbolt.org/z/PTsfvfjqx) Extracted from: - #8653 After this and the pr below is merged will improve the datafusion scalar to array to use this and make it really really fast: - #8658 # What changes are included in this PR? added new function # Are these changes tested? yes # Are there any user-facing changes? yes

alamb · 2025-10-20T20:52:58Z

Good, it looks like we have massive speedups

yes, nice work!

# Which issue does this PR close? N/A # Rationale for this change I want to repeat the same value multiple times in a very fast way which will be used in: - #8653 After this and the pr below is merged will improve the datafusion scalar to array to use this and make it really really fast: - #8656 # What changes are included in this PR? Created a function in `MutableBuffer` to repeat a slice a number of times in a logarithmic way to reduce memcopy calls # Are these changes tested? Yes # Are there any user-facing changes? Yes, and added docs ------- Extracted from: - #8653 Benchmark results on local machine | Slice Length | Repetitions (n) | repeat_slice_n_times | extend_from_slice loop | Speedup | |--------------|-----------------|----------------------|------------------------|---------| | 3 | 3 | 47.092 ns | 41.910 ns | 0.89x | | 3 | 64 | 63.548 ns | 222.29 ns | 3.50x | | 3 | 1024 | 105.57 ns | 3.031 µs | 28.7x | | 3 | 8192 | 405.71 ns | 24.170 µs | 59.6x | | 20 | 3 | 48.437 ns | 46.437 ns | 0.96x | | 20 | 64 | 74.993 ns | 319.04 ns | 4.25x | | 20 | 1024 | 350.94 ns | 4.437 µs | 12.6x | | 20 | 8192 | 2.440 µs | 35.524 µs | 14.6x | | 100 | 3 | 50.369 ns | 47.568 ns | 0.94x | | 100 | 64 | 119.70 ns | 165.37 ns | 1.38x | | 100 | 1024 | 1.734 µs | 2.623 µs | 1.51x | | 100 | 8192 | 10.615 µs | 19.750 µs | 1.86x | these are the results: <details> <summary>Result</summary> ``` MutableBuffer repeat slice/repeat_slice_n_times/slice_len=3 n=3 time: [46.719 ns 47.092 ns 47.453 ns] Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) low mild 1 (1.00%) high mild MutableBuffer repeat slice/extend_from_slice loop/slice_len=3 n=3 time: [41.833 ns 41.910 ns 41.996 ns] Found 11 outliers among 100 measurements (11.00%) 9 (9.00%) high mild 2 (2.00%) high severe MutableBuffer repeat slice/repeat_slice_n_times/slice_len=3 n=64 time: [62.935 ns 63.548 ns 64.183 ns] Found 5 outliers among 100 measurements (5.00%) 5 (5.00%) high mild MutableBuffer repeat slice/extend_from_slice loop/slice_len=3 n=64 time: [221.75 ns 222.29 ns 222.86 ns] Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe MutableBuffer repeat slice/repeat_slice_n_times/slice_len=3 n=1024 time: [105.15 ns 105.57 ns 106.01 ns] Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe MutableBuffer repeat slice/extend_from_slice loop/slice_len=3 n=1024 time: [3.0240 µs 3.0308 µs 3.0395 µs] Found 11 outliers among 100 measurements (11.00%) 2 (2.00%) low mild 5 (5.00%) high mild 4 (4.00%) high severe MutableBuffer repeat slice/repeat_slice_n_times/slice_len=3 n=8192 time: [401.57 ns 405.71 ns 409.94 ns] Found 6 outliers among 100 measurements (6.00%) 6 (6.00%) high mild MutableBuffer repeat slice/extend_from_slice loop/slice_len=3 n=8192 time: [24.124 µs 24.170 µs 24.222 µs] Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe MutableBuffer repeat slice/repeat_slice_n_times/slice_len=20 n=3 time: [48.287 ns 48.437 ns 48.606 ns] Found 8 outliers among 100 measurements (8.00%) 5 (5.00%) high mild 3 (3.00%) high severe MutableBuffer repeat slice/extend_from_slice loop/slice_len=20 n=3 time: [46.289 ns 46.437 ns 46.611 ns] Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe MutableBuffer repeat slice/repeat_slice_n_times/slice_len=20 n=64 time: [74.625 ns 74.993 ns 75.395 ns] Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild MutableBuffer repeat slice/extend_from_slice loop/slice_len=20 n=64 time: [318.20 ns 319.04 ns 319.98 ns] Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) high mild 5 (5.00%) high severe MutableBuffer repeat slice/repeat_slice_n_times/slice_len=20 n=1024 time: [346.66 ns 350.94 ns 355.17 ns] Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) low mild 2 (2.00%) high severe MutableBuffer repeat slice/extend_from_slice loop/slice_len=20 n=1024 time: [4.4251 µs 4.4369 µs 4.4506 µs] Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) low mild 2 (2.00%) high mild 5 (5.00%) high severe MutableBuffer repeat slice/repeat_slice_n_times/slice_len=20 n=8192 time: [2.4336 µs 2.4401 µs 2.4465 µs] Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe MutableBuffer repeat slice/extend_from_slice loop/slice_len=20 n=8192 time: [35.466 µs 35.524 µs 35.589 µs] Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe MutableBuffer repeat slice/repeat_slice_n_times/slice_len=100 n=3 time: [50.209 ns 50.369 ns 50.530 ns] Found 5 outliers among 100 measurements (5.00%) 5 (5.00%) high mild MutableBuffer repeat slice/extend_from_slice loop/slice_len=100 n=3 time: [47.439 ns 47.568 ns 47.701 ns] Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild MutableBuffer repeat slice/repeat_slice_n_times/slice_len=100 n=64 time: [117.77 ns 119.70 ns 122.00 ns] Found 12 outliers among 100 measurements (12.00%) 7 (7.00%) high mild 5 (5.00%) high severe MutableBuffer repeat slice/extend_from_slice loop/slice_len=100 n=64 time: [164.88 ns 165.37 ns 166.07 ns] Found 6 outliers among 100 measurements (6.00%) 5 (5.00%) high mild 1 (1.00%) high severe MutableBuffer repeat slice/repeat_slice_n_times/slice_len=100 n=1024 time: [1.7278 µs 1.7335 µs 1.7398 µs] Found 7 outliers among 100 measurements (7.00%) 1 (1.00%) low mild 5 (5.00%) high mild 1 (1.00%) high severe MutableBuffer repeat slice/extend_from_slice loop/slice_len=100 n=1024 time: [2.6176 µs 2.6232 µs 2.6305 µs] Found 5 outliers among 100 measurements (5.00%) 1 (1.00%) high mild 4 (4.00%) high severe MutableBuffer repeat slice/repeat_slice_n_times/slice_len=100 n=8192 time: [10.583 µs 10.615 µs 10.649 µs] Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild MutableBuffer repeat slice/extend_from_slice loop/slice_len=100 n=8192 time: [19.471 µs 19.750 µs 20.185 µs] Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) high mild 7 (7.00%) high severe ``` </details>

# Conflicts: # arrow-buffer/src/buffer/mutable.rs

perf: add optimized zip implementation for scalars

f665255

This is useful for `IF <expr> THEN <scalar> ELSE <scalar> END` TODO: - [ ] Need to add comments if missing - [ ] Add benchmark

github-actions bot added the arrow Changes to the arrow crate label Oct 19, 2025

rluvaton commented Oct 19, 2025

View reviewed changes

rluvaton mentioned this pull request Oct 19, 2025

bench: create zip kernel benchmarks #8654

Merged

rluvaton added 4 commits October 19, 2025 23:46

improve long string performance by a lot (compared to my prev impl)

a0bbe7f

format

f733a25

Merge branch 'main' into improve-zip-for-scalars

7dd2188

fix lint and format

98aa6bf

rluvaton commented Oct 19, 2025

View reviewed changes

arrow-buffer/src/buffer/mutable.rs Outdated Show resolved Hide resolved

rluvaton commented Oct 19, 2025

View reviewed changes

arrow-buffer/src/buffer/mutable.rs Outdated Show resolved Hide resolved

rluvaton commented Oct 19, 2025

View reviewed changes

arrow-buffer/src/buffer/mutable.rs Outdated Show resolved Hide resolved

rluvaton commented Oct 19, 2025

View reviewed changes

arrow-select/src/zip.rs Show resolved Hide resolved

update comments

7ccb30d

rluvaton commented Oct 19, 2025

View reviewed changes

arrow-buffer/src/buffer/mutable.rs Outdated Show resolved Hide resolved

rluvaton commented Oct 19, 2025

View reviewed changes

arrow-select/src/zip.rs Show resolved Hide resolved

rluvaton commented Oct 19, 2025

View reviewed changes

arrow-buffer/src/buffer/mutable.rs Outdated Show resolved Hide resolved

rluvaton added 2 commits October 20, 2025 00:41

perf: add optimized function to create offset with same length

5ac9879

add tests

e172f2e

rluvaton mentioned this pull request Oct 19, 2025

perf: add optimized function to create offset with same length #8656

Merged

rluvaton added 2 commits October 20, 2025 01:00

updated comment

3317a39

perf: add repeat_slice_n_times to MutableBuffer

5f01d05

this will be used in: - apache#8653

rluvaton mentioned this pull request Oct 20, 2025

perf: add repeat_slice_n_times to MutableBuffer #8658

Merged

rluvaton added 5 commits October 20, 2025 14:02

Merge branch 'add-push-repeated-slice' into improve-zip-for-scalars

ad3d716

# Conflicts: # arrow-buffer/src/buffer/mutable.rs

updated with apache#8658 changes

694eb72

Merge branch 'add-from-length-repeated-for-offset-buffer' into improv…

4086174

…e-zip-for-scalars

updated with apache#8656 changes

11f72a0

format

0787956

rluvaton added 6 commits October 20, 2025 14:46

add tests

96cd9cc

simplify implementation

72070dc

Merge branch 'add-push-repeated-slice' into improve-zip-for-scalars

9ffea3a

add example and test for scalar zipper

37b2cda

add send and sync

dd45394

Merge branch 'main' into improve-zip-for-scalars

dabbf55

rluvaton mentioned this pull request Oct 20, 2025

[EPIC] A collection of items to improve CASE performance apache/datafusion#18075

Open

9 tasks

Merge branch 'main' into improve-zip-for-scalars

9165612

# Conflicts: # arrow-buffer/src/buffer/mutable.rs

rluvaton marked this pull request as ready for review October 21, 2025 18:25

perf: add optimized zip implementation for scalars #8653

Are you sure you want to change the base?

perf: add optimized zip implementation for scalars #8653

Conversation

rluvaton commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

rluvaton Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rluvaton Oct 19, 2025

Choose a reason for hiding this comment

Uh oh!

rluvaton commented Oct 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alamb commented Oct 20, 2025

Uh oh!

alamb commented Oct 20, 2025

Uh oh!

alamb commented Oct 20, 2025

Uh oh!

alamb commented Oct 20, 2025

Uh oh!

rluvaton commented Oct 20, 2025

Uh oh!

alamb commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rluvaton commented Oct 19, 2025 •

edited

Loading

rluvaton Oct 19, 2025 •

edited

Loading