Skip to content

[SPARK-52817][SQL] Fix Like Expression performance #51510

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

zhixingheyi-tian
Copy link
Contributor

@zhixingheyi-tian zhixingheyi-tian commented Jul 16, 2025

What changes were proposed in this pull request?

Make contains function to be used in like expression with multiple '%'.

Why are the changes needed?

In some customers' cases , user sometimes use multiple '%' for like expression.

For Example:

SELECT * FROM testData where value not like '%%HotFocus%%'
SELECT * FROM testData where value not like '%%%HotFocus%%%'

In these SQL queries, cannot convert Like expressions to contains function in logical planning. So the performance is very poor.

How was this patch tested?

Added UTs and Existed UTs

@github-actions github-actions bot added the SQL label Jul 16, 2025
@wangyum
Copy link
Member

wangyum commented Jul 16, 2025

Could you add a test to cover this change?

@zhixingheyi-tian
Copy link
Contributor Author

Hi @wangyum

Have Add UTs.

cc @cloud-fan @baibaichen @dongjoon-hyun

@beliefer
Copy link
Contributor

Could you add description ?

@cloud-fan
Copy link
Contributor

Can you provide more context in the PR description? I don't understand what you are doing in this PR.

@zhixingheyi-tian
Copy link
Contributor Author

@beliefer @cloud-fan

Have added.

private val endsWith = "%([^_%]+)".r
private val startsAndEndsWith = "([^_%]+)%([^_%]+)".r
private val contains = "%([^_%]+)%".r
private val startsWith = "([^_%]+)%+".r
Copy link
Contributor

@cloud-fan cloud-fan Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So a single % is the same as more than one %? Can we leave a code comment to explain this case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants