Skip to content

LazyFrame.join() documentation for validate= parameter seems inaccurate #26678

@itamarst

Description

@itamarst

Description

LazyFrame.join() takes a validate parameter. Current docs say:

This is currently not supported by the streaming engine.

This appears to be inaccurate. As far as I can tell from reading the code, if validation is enabled (m:1, 1:m, 1:1) the streaming engine falls back to the in-memory engine, which then does validation. Empirical tests also suggest this is the case. So validation does happen, and memory usage can be significantly higher as streaming will be disabled.

If this is correct, and I'm not missing some edge case, a more accurate documentation for validate would be:

Using this parameter will result in higher memory usage, as it will disable streaming for the join.

Happy to submit a PR for this if I'm not misunderstanding.

Link

https://docs.pola.rs/api/python/stable/reference/lazyframe/api/polars.LazyFrame.join.html

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions