-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
DOC: mention .attrs are preserved in Parquet IO for pyarrow engine #61912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can run
pre-commit run --all-files
in your terminal to catch and fix these before pushing. Let me know if you want help setting it up!
@@ -2958,7 +2958,8 @@ def to_parquet( | |||
is expected and consistent with pandas' handling of categorical data. | |||
To manage file size and ensure a more predictable roundtrip process, | |||
consider using :meth:`Categorical.remove_unused_categories` on the | |||
DataFrame before saving. | |||
DataFrame before saving | |||
* When using the "pyarrow" engine, `DataFrame.attrs` are stored as part of the file's metadata and restored on reading. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI is failing because this line exceeds the maximum allowed length. Try splitting it into two lines to satisfy the linter.
Thanks for the PR, but I don't think we need to necessarily document this so closing. Happy to have your contributions on other issues labeled |
Given that this is a behaviour that is rather unique to the parquet format (most other IO methods in pandas don't preserve attrs, I think?), and also something that differs between both engines, this seems worth mentioning in the docs? |
I’m fine with this if the author can get the CI green |
This PR adds documentation to
DataFrame.to_parquet
andpandas.read_parquet
highlighting thatDataFrame.attrs
are preserved when using the "pyarrow" engine.This behavior is already implemented in
pandas/io/parquet/pyarrow.py
, but was undocumented. This PR improves discoverability for users.Notes
section in both docstringsFirst-time contributor 😊