Skip to content

Conversation

@jonathanc-n
Copy link
Contributor

@jonathanc-n jonathanc-n commented Sep 9, 2025

Which issue does this PR close?

Rationale for this change

Adds regular joins (left, right, full, inner) for PWMJ as they behave differently in the code path.

What changes are included in this PR?

Adds classic join + physical planner

Are these changes tested?

Yes SLT tests + unit tests

Follow up work to this pull request

  • Handling partitioned queries and multiple record batches (fuzz testing will be handled with this)
  • Simplify physical planning
  • Add more unit tests for different types (another pr as the LOC in this pr is getting a little daunting)

next would be to implement the existence joins

@github-actions github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) physical-plan Changes to the physical-plan crate labels Sep 9, 2025
@jonathanc-n jonathanc-n marked this pull request as draft September 9, 2025 04:03
@jonathanc-n
Copy link
Contributor Author

@2010YOUY01 Would you like to take a look at if this is how you wanted to split up the work? I just wanted to put this out today then i'll clean it up better this week. Only failing one external test currently.

if join_filter.is_none() && matches!(join_type, JoinType::Inner) {
// cross join if there is no join conditions and no join filter set
Arc::new(CrossJoinExec::new(physical_left, physical_right))
} else if num_range_filters == 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to refactor this in another pull request, just a refactor but it should be quite simple to do. Just wanted to get this version in first.

statement ok
set datafusion.execution.batch_size = 8192;

# TODO: partitioned PWMJ execution
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently doesn't allow partitioned execution, this would make reviewing the tests a little messy as many of the partitioned single range queries would switch to PWMJ. Another follow up, will be tracked in #17427

@jonathanc-n jonathanc-n marked this pull request as ready for review September 9, 2025 17:59
@jonathanc-n
Copy link
Contributor Author

cc @2010YOUY01 @comphead this pr is now ready!

@jonathanc-n jonathanc-n changed the title POC: ClassicJoin for PWMJ feat: ClassicJoin for PWMJ Sep 9, 2025
@2010YOUY01
Copy link
Contributor

This is great! I have some suggestions for the planning part, and I'll review the execution part tomorrow.

Refactor the in-equality extracting logic

I suggest to move the inequality-extracting logic from physical_planner.rs into https://github.com/apache/datafusion/blob/main/datafusion/optimizer/src/extract_equijoin_predicate.rs

The reason is we'd better put similar code into a single place, instead of let it scatter to multiple places. ExtractEquijoinPredicate logical optimizer rule is extracting equality join predicates like t1.v1 = t2.v1, here we want to extract t1.v1 < t2.v1, their logic should be very similar.

To do this I think we need to extend the logical plan join node with extra ie predicate field (maybe we can define a new struct for IE predicate with (Expr, Op, Expr), and we can also use that in other places)

/// Join two logical plans on one or more join columns
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct Join {
    ...
    /// Equijoin clause expressed as pairs of (left, right) join expressions
    pub on: Vec<(Expr, Expr)>,                                                                 
    /// In-equility clause expressed as pairs of (left, right) join expressions           <-- HERE
    pub ie_predicates: Vec<(Expr, IEOp, Expr)>,
    /// Filters applied during join (non-equi conditions)
    pub filter: Option<Expr>,
    ...
}

To make it compatible for systems only use the LogicalPlan API, but not the physical plans, we can also provide a utility to move the IE predicates back to the filter:

Before: 
ie_predicates: [t1.v1 < t2.v1, t1.v2 < t2.v2]
filter: (t1.v3 + t2.v3) = 100

After:
ie_predicates: []
filter: ((t1.v3 + t2.v3) = 100) AND (t1.v1 < t2.v1) AND (t1.v2 < t2.v2)

Perhaps we can open a PR only for this IE predicates extracting task, and during the initial planning we can simply move the IE predicates back to the filter with the above mentioned utility.

Make it configurable to turn on/off PWMJ

I'll try to finish #17467 soon to make it easier, so let's put this on hold for now.

@comphead
Copy link
Contributor

Thanks @jonathanc-n and @2010YOUY01

#17467 definitely would be nice to have as PWMJ can start as optional experimental join, which would be separately documented, showing benefits and limitations for the end user. Actually the same happened for SMJ being experimental feature for quite some time.

Another great point to identify bottlenecks in performance is to absorb some knowledge from #17488 and keep the join more stable.

As optional feature it is pretty safe to go, again referring to SMJ there was a separate ticket which post launch checks to make sure it is safe to use like #9846

Let me know your thoughts?

@jonathanc-n
Copy link
Contributor Author

jonathanc-n commented Sep 11, 2025

Yes I think the experimental flag should be added first and we can do the equality extraction logic as a follow up. WDYT @2010YOUY01 Do you think you want to get #17467 before this one?

@2010YOUY01
Copy link
Contributor

Yes I think the experimental flag should be added first and we can do the equality extraction logic as a follow up. WDYT @2010YOUY01 Do you think you want to get #17467 before this one?

Yes, so let's do other work first. If I can't get #17467 done when this PR is ready, let's add enable_piecewise_merge_join option here -- I think we can agree on this configuration.

Copy link
Contributor

@2010YOUY01 2010YOUY01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have gone over the exec.rs, and will continue with the stream implementation part soon.

ExecutionPlan, PlanProperties,
};
use crate::{DisplayAs, DisplayFormatType, ExecutionPlanProperties};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the best module comments I have seen.

@github-actions github-actions bot added the common Related to common crate label Sep 14, 2025
@jonathanc-n
Copy link
Contributor Author

@2010YOUY01 I have added the requested changes! Should be good for another go.

@jonathanc-n
Copy link
Contributor Author

Note there are several slower queries, that's because one join side is very small, so the brute-force nested loop join become optimal, I suspect in some cases NLJ can even win Hash Join. The planner/optimizer should take those cases into account, I think this is a good follow-up project to do.

Currently for it doesnt support swapping inputs. it should be faster when the right side is smaller.

@jonathanc-n
Copy link
Contributor Author

jonathanc-n commented Oct 14, 2025

@2010YOUY01 I have resolved the requested changes, I have also added null handling so that it is handling nulls first, then calculating to the first index to start from based on the number of nulls in the array.

Here are some follow up improvements:

  • Fix PWMJ predicate extraction, for now we can allow users to use PWMJ but I will include in a note for the enable PWMJ config to only enable to true for simpler predicates (for something like t.val > s.val and not t.val > (s.val + t.val))
  • Swapping inputs
  • Optimize some of the cases in your comment

@2010YOUY01
Copy link
Contributor

2010YOUY01 commented Oct 15, 2025

@2010YOUY01 I have resolved the requested changes, I have also added null handling so that it is handling nulls first, then calculating to the first index to start from based on the number of nulls in the array.

Here are some follow up improvements:

  • Fix PWMJ predicate extraction, for now we can allow users to use PWMJ but I will include in a note for the enable PWMJ config to only enable to true for simpler predicates (for something like t.val > s.val and not t.val > (s.val + t.val))
  • Swapping inputs
  • Optimize some of the cases in your comment

I think point 1 is a known bug, such case will be planned to PWMJ and panic. It's okay if it's not planned to PWMJ.
It should better be fixed now. If it's not easy to do, you can consider do #17482 (comment) as a split part of this PR, I've checked the code and I believe it's quite straightforward to implement.

@2010YOUY01
Copy link
Contributor

The sqlite tests are passing

  1. manually change the enable_piesewise_merge_join default to true
  2. run INCLUDE_SQLITE=true cargo test --profile release-nonlto --test sqllogictests

@2010YOUY01
Copy link
Contributor

I think this PR is almost ready

If anyone is interested to review, you can start with the comment in https://github.com/apache/datafusion/pull/17482/files#diff-bc8796293bb6ee5f3f2b43bfb2b43959726f0b458dd3475dbbb4ab6ea0fc5a8c for the high-level idea
cc @alamb and @comphead

@jonathanc-n
Copy link
Contributor Author

@2010YOUY01 I've added a fix for the bug. I think i will make an attempt to move it to the optimizer in a follow up due to some complications with passing the information to the physical planner.

Copy link
Contributor

@2010YOUY01 2010YOUY01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again, great job! I plan to wait a couple of days before merging, in case anyone wants to review again.

@2010YOUY01
Copy link
Contributor

@2010YOUY01 I have resolved the requested changes, I have also added null handling so that it is handling nulls first, then calculating to the first index to start from based on the number of nulls in the array.

Here are some follow up improvements:

  • Fix PWMJ predicate extraction, for now we can allow users to use PWMJ but I will include in a note for the enable PWMJ config to only enable to true for simpler predicates (for something like t.val > s.val and not t.val > (s.val + t.val))
  • Swapping inputs
  • Optimize some of the cases in your comment

Sounds great. Let's update the update the epic issue for follow up tasks.

I have a question. How do you plan to implement swapping inputs, is there anything beyond putting smaller table on the buffer side?

Besides, I suggest to support additional predicate first, so that PWMJ can be applicable to a wider scenario.

@comphead
Copy link
Contributor

Thanks @jonathanc-n and @2010YOUY01 for a great job, I'm planning to have another look soon

@jonathanc-n
Copy link
Contributor Author

Swapping table would actually put the smaller table on the stream side. So I would probably do a follow up with that.

I agree we should do the additional predicates first as follow up.

@jonathanc-n
Copy link
Contributor Author

@comphead Do you see anything that stands out to you? I believe it should be good to go!

@comphead
Copy link
Contributor

Thanks @jonathanc-n I think it is great. Just double checked it is disabled by default. Before going live we need a reliable fuzzer to prove the performance is not impacted like it was recently reported by user, whereas local tests were fine.

One thing to add though: it would probably be great to have a user documentation in .md file so they would be aware how and when to use this feature, it might totally happen for some workload this feature would be much more performant that standard approach. But the user would like to know about it at least.

This documentation can be done separately, thanks again 💪 and for @2010YOUY01 for having such detailed and thorough review

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jonathanc-n

@comphead comphead added this pull request to the merge queue Oct 21, 2025
Merged via the queue into apache:main with commit 1f434dc Oct 21, 2025
33 checks passed
@comphead
Copy link
Contributor

Btw @jonathanc-n this PR and feature would be a nice article and announcement , WDYT?

@jonathanc-n
Copy link
Contributor Author

Yes that sounds good, we could try working on it after the implementation for existence joins also gets merged or created so we can show all the benchmarks.

tobixdev pushed a commit to tobixdev/datafusion that referenced this pull request Nov 2, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- part of apache#17427 

## Rationale for this change
Adds regular joins (left, right, full, inner) for PWMJ as they behave
differently in the code path.

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?
Adds classic join + physical planner

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?
Yes SLT tests  + unit tests

## Follow up work to this pull request
- Handling partitioned queries and multiple record batches (fuzz testing
will be handled with this)
- Simplify physical planning
- Add more unit tests for different types (another pr as the LOC in this
pr is getting a little daunting)

next would be to implement the existence joins

---------

Co-authored-by: Yongting You <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate core Core DataFusion crate documentation Improvements or additions to documentation physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants