Skip to content

Regression: "h5py objects cannot be pickled" with cloudpickle #10712

@rabernat

Description

@rabernat

What happened?

Between 2025.7.1 an 2025.8.0, something changed in Xarray which broke our ability to pickle h5netcdf-backed datasets with cloudpickle. This in turn breaks Dask's ability to work with these datasets.

What did you expect to happen?

The code below works prior to 2025.8.0.

Minimal Complete Verifiable Example

import cloudpickle
import s3fs
import xarray as xr

s3 = s3fs.S3FileSystem(anon=True)
fname = "s3://earthmover-sample-data/netcdf/tas_Amon_GFDL-ESM4_hist-piNTCF_r1i1p1f1_gr1.nc"
ds = xr.open_dataset(s3.open(fname), engine="h5netcdf", chunks={}) 
cloudpickle.loads(cloudpickle.dumps(ds))

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[1], line 8
      6 fname = "s3://earthmover-sample-data/netcdf/tas_Amon_GFDL-ESM4_hist-piNTCF_r1i1p1f1_gr1.nc"
      7 ds = xr.open_dataset(s3.open(fname), engine="h5netcdf", chunks={}) 
----> 8 cloudpickle.loads(cloudpickle.dumps(ds))

File ~/mambaforge/envs/earthmover-demos/lib/python3.12/site-packages/cloudpickle/cloudpickle.py:1479, in dumps(obj, protocol, buffer_callback)
   1477 with io.BytesIO() as file:
   1478     cp = Pickler(file, protocol=protocol, buffer_callback=buffer_callback)
-> 1479     cp.dump(obj)
   1480     return file.getvalue()

File ~/mambaforge/envs/earthmover-demos/lib/python3.12/site-packages/cloudpickle/cloudpickle.py:1245, in Pickler.dump(self, obj)
   1243 def dump(self, obj):
   1244     try:
-> 1245         return super().dump(obj)
   1246     except RuntimeError as e:
   1247         if len(e.args) > 0 and "recursion" in e.args[0]:

File ~/mambaforge/envs/earthmover-demos/lib/python3.12/site-packages/h5py/_hl/base.py:369, in HLObject.__getnewargs__(self)
    359 def __getnewargs__(self):
    360     """Disable pickle.
    361 
    362     Handles for HDF5 objects can't be reliably deserialised, because the
   (...)    367     limitations, look at the h5pickle project on PyPI.
    368     """
--> 369     raise TypeError("h5py objects cannot be pickled")

TypeError: h5py objects cannot be pickled

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.12.11 | packaged by conda-forge | (main, Jun 4 2025, 14:38:53) [Clang 18.1.8 ]
python-bits: 64
OS: Darwin
OS-release: 23.4.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: None
LOCALE: (None, 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2

xarray: 2025.8.0
pandas: 2.3.2
numpy: 2.2.6
scipy: 1.16.1
netCDF4: 1.6.5
pydap: 3.5.6
h5netcdf: 1.6.4
h5py: 3.14.0
zarr: 3.1.2
cftime: 1.6.4
nc_time_axis: 1.4.1
iris: None
bottleneck: None
dask: 2025.7.0
distributed: 2025.7.0
matplotlib: 3.10.6
cartopy: None
seaborn: 0.13.2
numbagg: 0.9.2
fsspec: 2025.9.0
cupy: None
pint: None
sparse: None
flox: 0.10.6
numpy_groupies: 0.11.3
setuptools: 80.9.0
pip: 25.2
conda: None
pytest: None
mypy: None
IPython: 9.5.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions