-
Notifications
You must be signed in to change notification settings - Fork 1.6k
support filter pushdown with left-semi joins in HashJoinExec #17153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just small comment for now, going to look more into it tomorrow.
} | ||
|
||
// Right child analysis | ||
if right_preserved && !is_left_only_predicate(filter, left_schema_len) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can add a TODO here or just refactor now as I believe !is_left_only_predicate
here is not the correct behaviour due to mixed predicates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would love to get some help understanding what the correct behavior actually is... the abstract semantics of joins is a bit beyond my database knowledge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't !is_left_only_predicate
be a is_right_only_predicate
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming so, or we end up pushing mixed predicates into the from_child
call, which is unnecessary
/// assert_eq!(JoinType::Left.lr_is_preserved(), (true, false)); | ||
/// assert_eq!(JoinType::LeftSemi.lr_is_preserved(), (true, false)); | ||
/// ``` | ||
pub fn lr_is_preserved(self) -> (bool, bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we should use a slightly more clear name, maybe like join_side_is_preserved() or something along those lines
Thanks @adriangb for working on this. I think it would be nice to add a sqllogictest example showing the pushdown. Here is an example: copy (select i as k from generate_series(1, 10) t(i)) to 't1.parquet';
copy (select i as k, i as v from generate_series(1, 100) t(i)) to 't2.parquet';
create external table t1 stored as parquet location 't1.parquet';
create external table t2 stored as parquet location 't2.parquet';
select *
from t1
where k in (
select k
from t2
where t2.v >= 10
); However, I noticed that the filter is being pushed down to the wrong table: HashJoinExec
DataSourceExec t1.parquet -- should be added to t1
DataSourceExec t2.parquet predicate=v@1 >= 10 AND DynamicFilterPhysicalExpr [ k@0 >= 1 AND k@0 <= 10 ] I checked and it is also happening with inner joins (#17196), so its probably not related to this specific PR. |
Closing in favor of @kosiew 's work |
Closes #16973