-
Notifications
You must be signed in to change notification settings - Fork 1.6k
dissallow pushdown of volatile PhysicalExprs #16861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thank you! I'll check my cases. |
Thank you, @adriangb ! I can confirm that it works great with the table sampling, since I use
The volatile filter is not pushed to the datasource. Without this patch, it looked like I agree it'd be more scalable to have an abstract way to specify UDF volatility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @adriangb and @theirix -- this looks good to me. I think we need to change the volatility check but otherwise this is good to go
It is probably reasonable to add a pub fn volatility()
method to PhysicalExpr as well so that user defined expressions that are volatile / shouldn't be pushed down can be excluded as well
expr.as_any() | ||
.downcast_ref::<datafusion_physical_expr::ScalarFunctionExpr>() | ||
{ | ||
let name = scalar_function.fun().name(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can find volatility using
Which then has a volatility field: https://docs.rs/datafusion/latest/datafusion/logical_expr/struct.Signature.html#structfield.volatility
@@ -485,21 +497,32 @@ fn push_down_filters( | |||
// currently. `self_filters` are the predicates which are provided by the current node, | |||
// and tried to be pushed down over the child similarly. | |||
|
|||
let num_self_filters = self_filters.len(); | |||
let mut all_predicates = self_filters.clone(); | |||
// Filter out self_filters that contain volatile expressions and track indices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the book keeping for volatile functions got complicated (though I can't really see a better way to do it).
Maybe we could encapsulate the book keeping into some structure that would make it easier to track or something 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes agreed. I'll see what I can come up with...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Claude and I came up with pydantic#34. It's not exactly simple or pretty, but it does put the indexing logic in one place and make it testable in isolation. Curious what you folks think.
Closes #16545