-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
What happened?
Between 2025.7.1 an 2025.8.0, something changed in Xarray which broke our ability to pickle h5netcdf-backed datasets with cloudpickle. This in turn breaks Dask's ability to work with these datasets.
What did you expect to happen?
The code below works prior to 2025.8.0.
Minimal Complete Verifiable Example
import cloudpickle
import s3fs
import xarray as xr
s3 = s3fs.S3FileSystem(anon=True)
fname = "s3://earthmover-sample-data/netcdf/tas_Amon_GFDL-ESM4_hist-piNTCF_r1i1p1f1_gr1.nc"
ds = xr.open_dataset(s3.open(fname), engine="h5netcdf", chunks={})
cloudpickle.loads(cloudpickle.dumps(ds))
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[1], line 8
6 fname = "s3://earthmover-sample-data/netcdf/tas_Amon_GFDL-ESM4_hist-piNTCF_r1i1p1f1_gr1.nc"
7 ds = xr.open_dataset(s3.open(fname), engine="h5netcdf", chunks={})
----> 8 cloudpickle.loads(cloudpickle.dumps(ds))
File ~/mambaforge/envs/earthmover-demos/lib/python3.12/site-packages/cloudpickle/cloudpickle.py:1479, in dumps(obj, protocol, buffer_callback)
1477 with io.BytesIO() as file:
1478 cp = Pickler(file, protocol=protocol, buffer_callback=buffer_callback)
-> 1479 cp.dump(obj)
1480 return file.getvalue()
File ~/mambaforge/envs/earthmover-demos/lib/python3.12/site-packages/cloudpickle/cloudpickle.py:1245, in Pickler.dump(self, obj)
1243 def dump(self, obj):
1244 try:
-> 1245 return super().dump(obj)
1246 except RuntimeError as e:
1247 if len(e.args) > 0 and "recursion" in e.args[0]:
File ~/mambaforge/envs/earthmover-demos/lib/python3.12/site-packages/h5py/_hl/base.py:369, in HLObject.__getnewargs__(self)
359 def __getnewargs__(self):
360 """Disable pickle.
361
362 Handles for HDF5 objects can't be reliably deserialised, because the
(...) 367 limitations, look at the h5pickle project on PyPI.
368 """
--> 369 raise TypeError("h5py objects cannot be pickled")
TypeError: h5py objects cannot be pickled
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None
python: 3.12.11 | packaged by conda-forge | (main, Jun 4 2025, 14:38:53) [Clang 18.1.8 ]
python-bits: 64
OS: Darwin
OS-release: 23.4.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: None
LOCALE: (None, 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2
xarray: 2025.8.0
pandas: 2.3.2
numpy: 2.2.6
scipy: 1.16.1
netCDF4: 1.6.5
pydap: 3.5.6
h5netcdf: 1.6.4
h5py: 3.14.0
zarr: 3.1.2
cftime: 1.6.4
nc_time_axis: 1.4.1
iris: None
bottleneck: None
dask: 2025.7.0
distributed: 2025.7.0
matplotlib: 3.10.6
cartopy: None
seaborn: 0.13.2
numbagg: 0.9.2
fsspec: 2025.9.0
cupy: None
pint: None
sparse: None
flox: 0.10.6
numpy_groupies: 0.11.3
setuptools: 80.9.0
pip: 25.2
conda: None
pytest: None
mypy: None
IPython: 9.5.0
sphinx: None