-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Refactor nvl2 Function to Support Lazy Evaluation and Simplification via CASE Expression
#18191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 5 commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
8ead537
Rewrote nvl2 to simplify into a CASE expression, added return-field n…
kosiew a9048e9
Implement NVL2 execution for scalar and array inputs
kosiew d85087f
Add test for NVL2 expression evaluation with scalar inputs
kosiew b4b3550
cargo fmt
kosiew 10d78ac
Improve formatting of NVL2 expression tests for better readability
kosiew 4a3c665
Update datafusion/functions/src/core/nvl2.rs
kosiew a74bb36
Made the nvl2 scalar UDF return an internal error when it reaches exe…
kosiew File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #17357 the author chose to make
invoke_with_argsreturn an internal error instead of retaining the implementation. Would we want to do the same here?I'm a bit on the fence myself. On the one hand, this is effectively dead code for most users. On the other hand, raising an error here may cause breakage for users who have customised their optimiser passes and are not doing simplification. No idea if anyone actually does that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point 🤔
Personally I'm of the mind to remove this impl and have it return error; part of the benefit of this PR is reducing the amount of code we'd need to maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new fallback evaluator is exercised directly in test_create_physical_expr_nvl2, which builds physical expressions without running the simplifier. Returning an error here would regress those non-simplified code paths.
It isn’t just about satisfying a unit test, it's about preserving a supported API surface. SessionContext::create_physical_expr explicitly states that it performs coercion and rewrite passes but does not run the expression simplifier, so any expression handed directly to that API must still execute correctly without being rewritten to a CASE statement first.
The test_create_physical_expr_nvl2 fixture exercises exactly that public workflow by building a physical expression through SessionContext::create_physical_expr and evaluating it without simplification.
If we changed invoke_with_args to return an error, that flow would regress for library users in the same way it would fail for the test.
Rather than removing or rewriting the test, I think we should keep it to guard this behavior; it’s effectively documenting that nvl2 continues to work for consumers who rely on the non-simplifying physical-expr builder, which the function implementation currently supports.
I recommend keeping the implementation so those tests—and any downstream consumers that bypass simplification—continue to work.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change in
coalesce(and now indirectly alsonvl/ifnull) already broke this though. If unsimplified execution is desirable, perhapsnvlshould be restored too because to not have arbitrary behaviour depending on the used UDF. In other words, I think you have to be consistent about this. Either all physical exprs should work or you shouldn’t bother with this. Cherry picking is a bit pointless in my opinion.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One subtlety here is that there is a change in semantics before and after simplification.
nvl2(1, 1, 1 / 0)will fail pre simplification but will work correctly once simplified due to the switch from eager to lazy evaluation. I think I would prefer a clear failure over a subtle difference in behaviour.If we do want to keep the
invoke_with_argsimplementations, one option could be to consider #17997 (or some variant of that idea) so that it can also be implemented lazily.Regarding code maintenance/duplication,
nvl2is an instance of theExpressionOrExpressionevaluation method fromCaseExpr. Perhaps a slightly modified version ofCaseExpr::expr_or_exprcould be made so thatnvlandnvl2could call that? I think what I'm trying to say is that maybe code reuse viasimplifyis maybe not the best idea.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I amended invoke_with_args to return internal_err for consistency and also to reduce code.