Skip to content

Conversation

chenkovsky
Copy link
Contributor

@chenkovsky chenkovsky commented Aug 30, 2025

Which issue does this PR close?

Rationale for this change

arguments of coalesce are evaluated eagerly.

What changes are included in this PR?

simplify coalesce to case when expr.

Are these changes tested?

UT

Are there any user-facing changes?

No

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Aug 30, 2025
Copy link
Contributor

@nuno-faria nuno-faria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 1689 to 1690
# due to the reason describe in https://github.com/apache/datafusion/issues/8927,
# the following queries will fail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the test below now works, I think we could move this comment to the test below this one.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chenkovsky and @nuno-faria -- I think this PR is quite good and probably can be merged. My only potential concern is that we may mess up comet. Let's see if we get any more comments

02)--TableScan: t projection=[x, y]
physical_plan
01)ProjectionExec: expr=[coalesce(1, CAST(y@1 / x@0 AS Int64)) as coalesce(Int64(1),t.y / t.x), coalesce(2, CAST(y@1 / x@0 AS Int64)) as coalesce(Int64(2),t.y / t.x)]
01)ProjectionExec: expr=[CASE WHEN true THEN 1 ELSE CAST(y@1 / x@0 AS Int64) END as coalesce(Int64(1),t.y / t.x), CASE WHEN true THEN 2 ELSE CAST(y@1 / x@0 AS Int64) END as coalesce(Int64(2),t.y / t.x)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this should be able to be simplifed even more -- CASE WHEN true THEN ... should go to 1

I filed a ticket to track this idea:

}

let n = args.len();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a clever implementation.

However, I worry it may cause problems for comet which uses physical evaluation directly

@comphead or @andygrove do you know if comet uses the COALESCE implementation directly?

@mbutrovich
Copy link
Contributor

Thanks @chenkovsky and @nuno-faria -- I think this PR is quite good and probably can be merged. My only potential concern is that we may mess up comet. Let's see if we get any more comments

Comet was actually just thinking about coalesce:

apache/datafusion-comet#2270

Maybe @coderfender has some thoughts about this?

@alamb
Copy link
Contributor

alamb commented Sep 6, 2025

Thanks @chenkovsky and @nuno-faria -- I think this PR is quite good and probably can be merged. My only potential concern is that we may mess up comet. Let's see if we get any more comments

Comet was actually just thinking about coalesce:

apache/datafusion-comet#2270

Maybe @coderfender has some thoughts about this?

It seems like this PR does the same thing described in the comet PR

Thus it seems like a good thing to merge

Let's wait a while to see if anyone else has comments, otherwise I'll plan to merge this PR

@coderfender
Copy link

coderfender commented Sep 6, 2025

@alamb , @mbutrovich I made changes to comet to fallback to CASE statement to replicate lazy evaluation with coalesce (and then plan to work on this PR). Glad to see that the changes are looking good and once we update DataFusion's version in comet, we should be able to undo my comet changes to leverage DF's coalesce function . I will create a github issue on the comet side for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

COALESCE expr in datafusion should perform lazy evaluation of the operands Evaluates COALESCE arguments past first non-NULL value
5 participants