-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
Description
LazyFrame.join() takes a validate parameter. Current docs say:
This is currently not supported by the streaming engine.
This appears to be inaccurate. As far as I can tell from reading the code, if validation is enabled (m:1, 1:m, 1:1) the streaming engine falls back to the in-memory engine, which then does validation. Empirical tests also suggest this is the case. So validation does happen, and memory usage can be significantly higher as streaming will be disabled.
If this is correct, and I'm not missing some edge case, a more accurate documentation for validate would be:
Using this parameter will result in higher memory usage, as it will disable streaming for the join.
Happy to submit a PR for this if I'm not misunderstanding.
Link
https://docs.pola.rs/api/python/stable/reference/lazyframe/api/polars.LazyFrame.join.html