Refactor `nvl2` Function to Support Lazy Evaluation and Simplification via CASE Expression #18191

kosiew · 2025-10-21T02:51:23Z

Which issue does this PR close?

Closes Implement lazy evaluation for nvl2 #17983

Rationale for this change

The current implementation of the nvl2 function in DataFusion eagerly evaluates all its arguments, which can lead to unnecessary computation and incorrect behavior when handling expressions that should only be conditionally evaluated. This PR introduces lazy evaluation for nvl2, aligning its behavior with other conditional expressions like coalesce and improving both performance and correctness.

This change also introduces a simplification rule that rewrites nvl2 expressions into equivalent CASE statements, allowing for better optimization during query planning and execution.

What changes are included in this PR?

Refactored nvl2 implementation in datafusion/functions/src/core/nvl2.rs:
- Added support for short-circuit (lazy) evaluation using short_circuits().
- Implemented simplify() method to rewrite expressions into CASE form.
- Introduced return_field_from_args() for correct nullability and type inference.
- Replaced the previous eager nvl2_func() logic with an optimized, more declarative approach.
Added comprehensive unit tests:
- test_nvl2_short_circuit in dataframe_functions.rs verifies correct short-circuit behavior.
- test_create_physical_expr_nvl2 in expr_api/mod.rs validates physical expression creation and output correctness.

Are these changes tested?

✅ Yes, multiple new tests are included:

test_nvl2_short_circuit ensures nvl2 does not evaluate unnecessary branches.
test_create_physical_expr_nvl2 checks the correctness of evaluation and type coercion behavior.

All existing and new tests pass successfully.

Are there any user-facing changes?

Yes, but they are non-breaking and performance-enhancing:

nvl2 now evaluates lazily, meaning only the required branch is computed based on the nullity of the test expression.
Expression simplification will yield more optimized query plans.

There are no API-breaking changes. However, users may observe improved performance and reduced computation for expressions involving nvl2.

…ullability handling, and marked the evaluator as short-circuiting to avoid unreachable execution paths. Added a dataframe regression test that exercises nvl2 with a potentially failing branch to confirm lazy evaluation behaviour.

Handle NVL2 execution when simplifier is skipped using null masks to select between branch values. Add a regression test for expr_api to validate SessionContext::create_physical_expr with NVL2, ensuring successful evaluation without prior simplification.

pepijnve · 2025-10-21T14:52:12Z

datafusion/functions/src/core/nvl2.rs

+
    fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result<ColumnarValue> {
-        nvl2_func(&args.args)
+        let [test, if_non_null, if_null] = take_function_args(self.name(), args.args)?;


In #17357 the author chose to make invoke_with_args return an internal error instead of retaining the implementation. Would we want to do the same here?

I'm a bit on the fence myself. On the one hand, this is effectively dead code for most users. On the other hand, raising an error here may cause breakage for users who have customised their optimiser passes and are not doing simplification. No idea if anyone actually does that.

That's a good point 🤔

Personally I'm of the mind to remove this impl and have it return error; part of the benefit of this PR is reducing the amount of code we'd need to maintain.

make invoke_with_args return an internal error instead of retaining the implementation.

The new fallback evaluator is exercised directly in test_create_physical_expr_nvl2, which builds physical expressions without running the simplifier. Returning an error here would regress those non-simplified code paths.

It isn’t just about satisfying a unit test, it's about preserving a supported API surface. SessionContext::create_physical_expr explicitly states that it performs coercion and rewrite passes but does not run the expression simplifier, so any expression handed directly to that API must still execute correctly without being rewritten to a CASE statement first.
The test_create_physical_expr_nvl2 fixture exercises exactly that public workflow by building a physical expression through SessionContext::create_physical_expr and evaluating it without simplification.
If we changed invoke_with_args to return an error, that flow would regress for library users in the same way it would fail for the test.

Rather than removing or rewriting the test, I think we should keep it to guard this behavior; it’s effectively documenting that nvl2 continues to work for consumers who rely on the non-simplifying physical-expr builder, which the function implementation currently supports.

I recommend keeping the implementation so those tests—and any downstream consumers that bypass simplification—continue to work.

The change in coalesce (and now indirectly also nvl/ifnull) already broke this though. If unsimplified execution is desirable, perhaps nvl should be restored too because to not have arbitrary behaviour depending on the used UDF. In other words, I think you have to be consistent about this. Either all physical exprs should work or you shouldn’t bother with this. Cherry picking is a bit pointless in my opinion.

any expression handed directly to that API must still execute correctly without being rewritten to a CASE statement first.

One subtlety here is that there is a change in semantics before and after simplification. nvl2(1, 1, 1 / 0) will fail pre simplification but will work correctly once simplified due to the switch from eager to lazy evaluation. I think I would prefer a clear failure over a subtle difference in behaviour.

If we do want to keep the invoke_with_args implementations, one option could be to consider #17997 (or some variant of that idea) so that it can also be implemented lazily.

Regarding code maintenance/duplication, nvl2 is an instance of the ExpressionOrExpression evaluation method from CaseExpr. Perhaps a slightly modified version of CaseExpr::expr_or_expr could be made so that nvl and nvl2 could call that? I think what I'm trying to say is that maybe code reuse via simplify is maybe not the best idea.

In #17357 the author chose to make invoke_with_args return an internal error instead of retaining the implementation. Would we want to do the same here?

I amended invoke_with_args to return internal_err for consistency and also to reduce code.

datafusion/functions/src/core/nvl2.rs

Jefffrey · 2025-10-22T00:44:38Z

datafusion/functions/src/core/nvl2.rs

+
    fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result<ColumnarValue> {
-        nvl2_func(&args.args)
+        let [test, if_non_null, if_null] = take_function_args(self.name(), args.args)?;


That's a good point 🤔

Personally I'm of the mind to remove this impl and have it return error; part of the benefit of this PR is reducing the amount of code we'd need to maintain.

Co-authored-by: Jeffrey Vo <[email protected]>

…cution without being simplified to a CASE expression, removing the eager evaluation helpers that previously enforced eager semantics. Updated the expr_api integration test to assert that unsimplified nvl2 evaluation now fails with the expected internal error message.

alamb

Thanks @kosiew @pepijnve and @Jefffrey

alamb · 2025-10-23T20:50:17Z

This is one we probably should run the extended tests on too.
I started it locally with

INCLUDE_SQLITE=true cargo test --profile release-nonlto --test sqllogictests

And will report back

alamb · 2025-10-24T09:58:54Z

Extended tests passed for me. Thanks again @kosiew @pepijnve and @Jefffrey

kosiew added 5 commits October 21, 2025 10:48

Add test for NVL2 expression evaluation with scalar inputs

d85087f

cargo fmt

b4b3550

Improve formatting of NVL2 expression tests for better readability

10d78ac

github-actions bot added core Core DataFusion crate functions Changes to functions implementation labels Oct 21, 2025

kosiew marked this pull request as ready for review October 21, 2025 08:50

pepijnve reviewed Oct 21, 2025

View reviewed changes

Jefffrey reviewed Oct 22, 2025

View reviewed changes

kosiew and others added 2 commits October 22, 2025 11:13

Update datafusion/functions/src/core/nvl2.rs

4a3c665

Co-authored-by: Jeffrey Vo <[email protected]>

kosiew force-pushed the nvl2-17983 branch from 3b1738c to a74bb36 Compare October 23, 2025 08:16

Jefffrey approved these changes Oct 23, 2025

View reviewed changes

alamb approved these changes Oct 23, 2025

View reviewed changes

alamb added this pull request to the merge queue Oct 24, 2025

Merged via the queue into apache:main with commit 22c4214 Oct 24, 2025
28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor `nvl2` Function to Support Lazy Evaluation and Simplification via CASE Expression #18191

Refactor `nvl2` Function to Support Lazy Evaluation and Simplification via CASE Expression #18191

kosiew commented Oct 21, 2025

Uh oh!

pepijnve Oct 21, 2025

Uh oh!

Jefffrey Oct 22, 2025

Uh oh!

kosiew Oct 22, 2025

Uh oh!

pepijnve Oct 22, 2025 •

edited

Loading

Uh oh!

pepijnve Oct 22, 2025

Uh oh!

kosiew Oct 23, 2025

Uh oh!

Uh oh!

Jefffrey Oct 22, 2025

Uh oh!

alamb left a comment

Uh oh!

alamb commented Oct 23, 2025

Uh oh!

alamb commented Oct 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Refactor nvl2 Function to Support Lazy Evaluation and Simplification via CASE Expression #18191

Refactor nvl2 Function to Support Lazy Evaluation and Simplification via CASE Expression #18191

Conversation

kosiew commented Oct 21, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

pepijnve Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

Jefffrey Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

kosiew Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

pepijnve Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pepijnve Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

kosiew Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Jefffrey Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb commented Oct 23, 2025

Uh oh!

alamb commented Oct 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Refactor `nvl2` Function to Support Lazy Evaluation and Simplification via CASE Expression #18191

Refactor `nvl2` Function to Support Lazy Evaluation and Simplification via CASE Expression #18191

pepijnve Oct 22, 2025 •

edited

Loading