Optimize merging of partial case expression results #18152

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

pepijnve wants to merge 21 commits into apache:main from pepijnve:case_reduce_filtering

+414 −104

Contributor

pepijnve commented Oct 18, 2025 •

edited

Loading

Which issue does this PR close?

Improvement in the context of [EPIC] A collection of items to improve CASE performance #18075
Continues on Short circuit complex case evaluation modes as soon as possible #17898

Rationale for this change

Case evaluation currently uses PhysicalExpr::evaluate_selection for each branch of the case expression. This implementation is fine, but because evaluate_selection is not specific to the case logic we're missing some optimisation opportunities. The main consequence is that too much work is being done filtering record batches and scattering results. This PR introduces specialised filtering logic and result interleaving for case.

A more detailed description and diagrams are available at #18075 (comment)

What changes are included in this PR?

Rewrite the case_when_no_expr and case_when_with_expr evaluation loops to avoid as much unnecessary work as possible. In particular the remaining rows to be evaluated are retained across loop iterations. This allows the record batch that needs to be filtered to shrink as the loop is being evaluated which reduces the number of rows that needs to be refiltered. If a when predicate does not match any rows at all, filtering is avoided entirely.

The final result is also not merged every loop iteration. Instead an index vector is constructed which is used to compose the final result once using a custom 'multi zip'/'interleave' like operation.

Are these changes tested?

Covered by existing unit tests and SLTs

Are there any user-facing changes?

No

github-actions bot added physical-expr sqllogictest labels

pepijnve changed the title ~~Case reduce filtering~~ Reduce unnecessary record batch filtering in "case without expression"

Contributor Author

pepijnve commented Oct 18, 2025

macOS test failure is unrelated afaict. Looks like a DNS issue on the test runner.

pepijnve mentioned this pull request

Short circuit complex case evaluation modes as soon as possible #17898

Merged

pepijnve force-pushed the case_reduce_filtering branch from 206af12 to 39ab973 Compare

October 18, 2025 15:59

pepijnve changed the title ~~Reduce unnecessary record batch filtering in "case without expression"~~ Reduce unnecessary record batch filtering in "case with(out) expression"

pepijnve marked this pull request as draft

October 18, 2025 18:51

pepijnve force-pushed the case_reduce_filtering branch from 24e3d38 to 39ab973 Compare

October 18, 2025 18:55

pepijnve added 4 commits

October 20, 2025 22:46


          Reduce record batch filtering in case_when_no_expr

e8698f7


          Reduce record batch filtering in case_when_with_expr

115b554


          Avoid unnecessary filtering in last iteration when no else expression…

9c641aa

… is present


          Use interleave to construct case result

pepijnve force-pushed the case_reduce_filtering branch from 4442c54 to 7539068 Compare

October 20, 2025 22:23

github-actions bot removed the sqllogictest label

pepijnve marked this pull request as ready for review

October 20, 2025 22:26

pepijnve changed the title ~~Reduce unnecessary record batch filtering in "case with(out) expression"~~ Use interleave to compose case expression results


          Add comments

b4c1117

Contributor Author

pepijnve commented Oct 20, 2025

@alamb could you run the benchmarks against this?

pepijnve added 5 commits

October 21, 2025 13:09


          Handle null where values correctly in case_when_with_expr

7eae3e4


          Align case_when_with_expr and case_when_no_expr

f5448c9


          Exit early when all base values are null

132757e


          Avoid calling interleave in simple cases

43c2fe2


          Formatting

f49d3ea

Contributor Author

pepijnve commented Oct 21, 2025

The test failure for f49d3ea seems unrelated. Pulling in changes from main to see if that resolves the problem.


          Merge branch 'main' into case_reduce_filtering

7e21a19

pepijnve marked this pull request as draft

October 22, 2025 18:30

pepijnve force-pushed the case_reduce_filtering branch from 96451ac to 864b156 Compare

October 23, 2025 13:07

pepijnve changed the title ~~Use interleave to compose case expression results~~ Optimize interleaving of partial case expression results

pepijnve force-pushed the case_reduce_filtering branch from 864b156 to 01d51d8 Compare

October 23, 2025 13:15

pepijnve marked this pull request as ready for review

October 23, 2025 13:15

pepijnve changed the title ~~Optimize interleaving of partial case expression results~~ Optimize merging of partial case expression results

pepijnve mentioned this pull request

[EPIC] A collection of items to improve CASE performance #18075

Open

10 tasks

pepijnve force-pushed the case_reduce_filtering branch 2 times, most recently from e7d3ec0 to a7bbd8f Compare

October 23, 2025 13:54


          Use a custom merge strategy that takes the case evaluation logic into…

9cb6496

… account

pepijnve force-pushed the case_reduce_filtering branch from a7bbd8f to 9cb6496 Compare

October 23, 2025 13:55


          Always optimize filters

37f1334

rluvaton reviewed

View reviewed changes

Member

rluvaton left a comment

Reviewed a little bit will try to review more later

datafusion/physical-expr/src/expressions/case.rs Outdated

    
                                  // Merge into a single array.

                                  let data_refs = self.partial_results.iter().collect();

                                  let mut mutable = MutableArrayData::new(

Member

rluvaton Oct 23, 2025

Using mutable array is not very performant for the case when there can be a small range of values from the same array.

Isn't it basically an interleave?

Contributor Author

pepijnve Oct 23, 2025

No, what I'm doing here is less general than interleave. I tried that at first, but it gave pretty bad performance in certain cases. This is more like a multi array zip. In contrast to zip the rows are not expected to be lined up here. Instead values are taken from the start of each array. I'll try to make an ascii art drawing of what's being going on.

Regarding the usage of MutableArrayData, I took inspiration from the zip implementation. I tried to avoid reaching this point in the trivial cases to avoid overhead where possible. This code path is not taken for the simple evaluation methods either.

Contributor Author

pepijnve Oct 23, 2025 •

edited

Loading

It might be useful to explicitly point out that the big change in this PR is that the per branch results are no longer scattered back to the length of the input record batch. Instead the potentially small results are held as is in small arrays. Only at the end everything gets consolidated.

What's not yet handled in an optimal fashion by this code is the variant you're working where you know up front that all the values are going to be scalars. That's intentional, I'm only trying to improve the general case first. In other words, please compare with the status quo on main and not with all the potential further optimisations that might be possible.

One further optimisation that would be nice to add would be based on apache/arrow-rs#8658. That would allow us to avoid expanding scalars to arrays and instead fold that into the merge operation. Not available for use just yet though, so that will have to wait for later.

apache/arrow-rs#8653 will be useful for ExprOrExpr, but is going to be of more limited use in the general eval methods. You can only zip two scalars and at that point you have a scalar and a non scalar. When we have to reduce more than two arrays it's back to regular zip (which is kind of what I'm doing here, but without the alignment requirement and in a single pass for all arrays).

datafusion/physical-expr/src/expressions/case.rs Outdated

    
                      Ok(())

                  }

                  fn add_partial_result(&mut self, rows: &ArrayRef, data: ArrayData) {

Member

rluvaton Oct 23, 2025

Can you add comments what does rows for and data for?

And rename if needed

Contributor Author

pepijnve Oct 23, 2025

Isn't that just going to be repeating the comments from add_branch_result? I don't think it's very useful to repeat the same thing over and over again. It's not like this is public API.

datafusion/physical-expr/src/expressions/case.rs

    
                  // An optional result that is the covering result for all rows.

                  // This is used as an optimisation to avoid the cost of merging when all rows

                  // evaluate to the same case branch.

                  covering_result: Option<ColumnarValue>,

Member

rluvaton Oct 23, 2025

What does covering means

Contributor Author

pepijnve Oct 23, 2025

I couldn't come up with a better word for this. It's in contrast with partial and as in 'covering index'.

This gets filled in when one branch of the case expression matches all the input rows from the record batch. In that case there's no need to do the complex merge logic.

Open to suggestions for a better name.

datafusion/physical-expr/src/expressions/case.rs

    
                      .iter()

                      .map(|a| filter_array(a, filter))

                      .collect::<std::result::Result<Vec<_>, _>>()?;

                  unsafe {

Member

rluvaton Oct 23, 2025

Is this a hot loop? If not why unsafe is needed? The check is inexpensive

Generally when writing unsafe comment you need to write a comment why it is safe to use and what is the benefit

Contributor Author

pepijnve Oct 23, 2025

It depends on the case expression, but it can be. We need to create subset record batches for each when branch and the possibly the else branch. The validation checks that try_new would do are completely redundant here.
I've added some comments and a pointer to apache/arrow-rs#8693.

datafusion/physical-expr/src/expressions/case.rs Outdated

    
                  else_expr: Option<Arc<dyn PhysicalExpr>>,

                  /// Evaluation method to use

                  eval_method: EvalMethod,

                  pub eval_method: EvalMethod,

Member

rluvaton Oct 23, 2025

why the pub is needed here?
And also this is an implementation detail that I don't think need to be expose

Contributor Author

pepijnve Oct 23, 2025

That's a debugging remnant. I'll get rid of that.

datafusion/physical-expr/src/expressions/case.rs Outdated

    
              #[derive(Debug, Hash, PartialEq, Eq)]

              enum EvalMethod {

              pub enum EvalMethod {

Member

rluvaton Oct 23, 2025

Why pub here, this is an implementation detail that I don't think we should expose

Contributor Author

pepijnve Oct 23, 2025 •

edited

Loading

Removed. Was committed by accident. I had made this public temporarily so I could print out the eval method in the benchmarks.

pepijnve added 5 commits

October 23, 2025 17:40


          Remove accidental addition of pub

1b2942b


          Add comments regarding RecordBatch::new_unchecked usage

f6a5734


          Attempt to clarify merge logic


          Rename arguments

cad318e


          Formatting

d57e7b6

pepijnve force-pushed the case_reduce_filtering branch from 2e6d07a to 5123d5a Compare

October 23, 2025 16:42


          More diagrams

7f60d68

pepijnve force-pushed the case_reduce_filtering branch from 5123d5a to 7f60d68 Compare

October 23, 2025 16:43

pepijnve added 2 commits

October 23, 2025 18:44


          More comments

9cf46b2


          Remove redundant comment

fc04bd5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels