Skip to content

Conversation

adriangb
Copy link
Contributor

Closes #16973

@github-actions github-actions bot added optimizer Optimizer rules core Core DataFusion crate common Related to common crate physical-plan Changes to the physical-plan crate labels Aug 12, 2025
Copy link
Contributor

@jonathanc-n jonathanc-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just small comment for now, going to look more into it tomorrow.

}

// Right child analysis
if right_preserved && !is_left_only_predicate(filter, left_schema_len) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add a TODO here or just refactor now as I believe !is_left_only_predicate here is not the correct behaviour due to mixed predicates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would love to get some help understanding what the correct behavior actually is... the abstract semantics of joins is a bit beyond my database knowledge

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't !is_left_only_predicate be a is_right_only_predicate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming so, or we end up pushing mixed predicates into the from_child call, which is unnecessary

/// assert_eq!(JoinType::Left.lr_is_preserved(), (true, false));
/// assert_eq!(JoinType::LeftSemi.lr_is_preserved(), (true, false));
/// ```
pub fn lr_is_preserved(self) -> (bool, bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should use a slightly more clear name, maybe like join_side_is_preserved() or something along those lines

@nuno-faria
Copy link
Contributor

Thanks @adriangb for working on this.

I think it would be nice to add a sqllogictest example showing the pushdown. Here is an example:

copy (select i as k from generate_series(1, 10) t(i)) to 't1.parquet';
copy (select i as k, i as v from generate_series(1, 100) t(i)) to 't2.parquet';
create external table t1 stored as parquet location 't1.parquet';
create external table t2 stored as parquet location 't2.parquet';

select *
from t1
where k in (
    select k
    from t2
    where t2.v >= 10
);

However, I noticed that the filter is being pushed down to the wrong table:

HashJoinExec
  DataSourceExec t1.parquet -- should be added to t1
  DataSourceExec t2.parquet predicate=v@1 >= 10 AND DynamicFilterPhysicalExpr [ k@0 >= 1 AND k@0 <= 10 ]

I checked and it is also happening with inner joins (#17196), so its probably not related to this specific PR.

@adriangb
Copy link
Contributor Author

adriangb commented Sep 4, 2025

Closing in favor of @kosiew 's work

@adriangb adriangb closed this Sep 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
common Related to common crate core Core DataFusion crate optimizer Optimizer rules physical-plan Changes to the physical-plan crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable dynamic filter pushdown for non-inner joins in physical plans
3 participants