Skip to content

feat: improve LiteralGuarantee for the case like (a=1 AND b=1) OR (a=2 AND b=3) #16762

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jul 23, 2025

Conversation

haohuaijin
Copy link
Contributor

@haohuaijin haohuaijin commented Jul 13, 2025

Which issue does this PR close?

Rationale for this change

improve LiteralGuarantee to handle the case like
(a=1 AND b=1) OR (a=2 AND b=3) or (a IN ("foo", "bar") AND b = 5) OR (a IN ("bar") AND b=6)

What changes are included in this PR?

add the logical to extract (a=1 AND b=1) OR (a=2 AND b=3) to in_guarantee("a", [1, 2]), in_guarantee("b", [1, 3]);

  1. splits each disjunction into its constituent conjunctions and filters for equality operations
  2. the find_common_columns function that identifies columns present in all termsets
  3. iterates through common columns and builds guarantees

Are these changes tested?

yes, add some test case

Are there any user-facing changes?

@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Jul 13, 2025
@haohuaijin
Copy link
Contributor Author

cc @debajyoti-truefoundry @alamb

@alamb alamb changed the title feat: imporve LiteralGuarantee for the case like (a=1 AND b=1) OR (a=2 AND b=3) feat: improve LiteralGuarantee for the case like (a=1 AND b=1) OR (a=2 AND b=3) Jul 14, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @haohuaijin -- this looks like a great start to me

I think we need a few more tests to show it doesn't incorrectly pick up literal guarantees for NOT IN / != terms, but otherwise I think it is good

@haohuaijin
Copy link
Contributor Author

haohuaijin commented Jul 15, 2025

Thanks fo you reviews @alamb , i address you comment in 89dc6be

@alamb
Copy link
Contributor

alamb commented Jul 18, 2025

I am sorry @haohuaijin -- I will review this more carefully soon. I just need to sit down and think through the details to make sure it doesn't have any correctness problems

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @haohuaijin -- I reviewed the code and tests carefully and I think this PR looks good to me.

It is a very nice improvement

@alamb alamb added the performance Make DataFusion faster label Jul 21, 2025
@haohuaijin
Copy link
Contributor Author

Thanks for you reviews @alamb

@alamb alamb merged commit 3c95281 into apache:main Jul 23, 2025
27 checks passed
@haohuaijin haohuaijin deleted the hj/guarantee-optimize branch July 23, 2025 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Make DataFusion faster physical-expr Changes to the physical-expr crates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bloom filters are unused for certain where clause patterns (improve LiteralGuarantee)
2 participants