DOC: mention .attrs are preserved in Parquet IO for pyarrow engine #61912

imramraja · 2025-07-20T10:19:28Z

This PR adds documentation to DataFrame.to_parquet and pandas.read_parquet highlighting that DataFrame.attrs are preserved when using the "pyarrow" engine.

This behavior is already implemented in pandas/io/parquet/pyarrow.py, but was undocumented. This PR improves discoverability for users.

Added Notes section in both docstrings
(Optional) Will add test in follow-up if needed

First-time contributor 😊

arthurlw

You can run

pre-commit run --all-files

in your terminal to catch and fix these before pushing. Let me know if you want help setting it up!

arthurlw · 2025-07-21T03:41:43Z

pandas/core/frame.py

@@ -2958,7 +2958,8 @@ def to_parquet(
          is expected and consistent with pandas' handling of categorical data.
          To manage file size and ensure a more predictable roundtrip process,
          consider using :meth:`Categorical.remove_unused_categories` on the
-          DataFrame before saving.
+          DataFrame before saving
+        * When using the "pyarrow" engine, `DataFrame.attrs` are stored as part of the file's metadata and restored on reading.


CI is failing because this line exceeds the maximum allowed length. Try splitting it into two lines to satisfy the linter.

mroeschke · 2025-07-21T16:33:27Z

Thanks for the PR, but I don't think we need to necessarily document this so closing. Happy to have your contributions on other issues labeled good first issue

jorisvandenbossche · 2025-07-22T08:28:10Z

I don't think we need to necessarily document this

Given that this is a behaviour that is rather unique to the parquet format (most other IO methods in pandas don't preserve attrs, I think?), and also something that differs between both engines, this seems worth mentioning in the docs?

jbrockmendel · 2025-07-31T19:52:23Z

I’m fine with this if the author can get the CI green

Update frame.py

0d95b2a

arthurlw suggested changes Jul 21, 2025

View reviewed changes

arthurlw added the Docs label Jul 21, 2025

mroeschke closed this Jul 21, 2025

mroeschke reopened this Jul 22, 2025

mroeschke added the IO Parquet parquet, feather label Jul 22, 2025

jorisvandenbossche changed the title ~~Update frame.py~~ DOC: mention .attrs are preserved in Parquet IO for pyarrow engine Jul 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DOC: mention .attrs are preserved in Parquet IO for pyarrow engine #61912

DOC: mention .attrs are preserved in Parquet IO for pyarrow engine #61912

imramraja commented Jul 20, 2025

Uh oh!

arthurlw left a comment

Uh oh!

arthurlw Jul 21, 2025 •

edited

Loading

Uh oh!

mroeschke commented Jul 21, 2025

Uh oh!

jorisvandenbossche commented Jul 22, 2025

Uh oh!

jbrockmendel commented Jul 31, 2025

Uh oh!

Uh oh!

Uh oh!

DOC: mention .attrs are preserved in Parquet IO for pyarrow engine #61912

Are you sure you want to change the base?

DOC: mention .attrs are preserved in Parquet IO for pyarrow engine #61912

Conversation

imramraja commented Jul 20, 2025

Uh oh!

arthurlw left a comment

Choose a reason for hiding this comment

Uh oh!

arthurlw Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mroeschke commented Jul 21, 2025

Uh oh!

jorisvandenbossche commented Jul 22, 2025

Uh oh!

jbrockmendel commented Jul 31, 2025

Uh oh!

Uh oh!

arthurlw Jul 21, 2025 •

edited

Loading