-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Description
What happened?
Hi folks, something that I just found out, please see issue below:
Problem class: WritableCFDataStore()
xarray/xarray/backends/common.py
Line 647 in 40119bf
class WritableCFDataStore(AbstractWritableDataStore): |
Used case and MRE
I am loading a Zarr store from an S3 bucket; the file is about 500MB compressed (Blosc), but uncompressed data is about 1GB, then I'm passing it to WritableCFDataStore
:
import xarray as xr
from xarray.backends.common import WritableCFDataStore
def encode_zarr_file():
# file is about 500MB compressed Zarr3
zarr_path = (
"https://uor-aces-o.s3-ext.jc.rl.ac.uk/"
"esmvaltool-zarr/cl_Amon_UKESM1-0-LL_ssp370SST-lowNTCF_r1i1p1f2.zarr3"
)
time_coder = xr.coders.CFDatetimeCoder(use_cftime=True)
zarr_xr = xr.open_dataset(
zarr_path,
consolidated=True,
decode_times=time_coder,
engine="zarr",
backend_kwargs={},
)
variables = zarr_xr.variables
dts = WritableCFDataStore()
dts.encode(variables, {})
encode_zarr_file()
MRE observed behaviour:
- Xarray Dataset loaded from Zarr has lazy data (mem about 1 few tens of MBs)
- encoding it with
WritableCFDataStore
realizes its data, and my memory consumption goes up to 1GB and change; passing attributes makes that even bigger but that's normal
MRE desired behaviour:
- Since the base class
WritableCFDataStore
is used quite a bit especially for Zarr -> NetCDF4 format conversions (the one I'm after hehe), it would be absolutely brilliant and much desired that thevariables
' data be kept lazy, and all the operations done by the CF converter/encoder be done on Dask arrays rather than Numpy arrays - CF-compliance/formatting done lazily is possible, and that's currently implemented in such tools like @davidhassell cf-python
Can I help?
Most definitely! But my knowledge of Xarray is rather limited, so I think I'd best be suited to test a PR than to implement it.
Very many thanks in advance! Cheers π»
What did you expect to happen?
No response
Minimal Complete Verifiable Example
MVCE confirmation
- Minimal example β the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example β the example is self-contained, including all data and the text of any traceback.
- Verifiable example β the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue β a search of GitHub Issues suggests this is not a duplicate.
- Recent environment β the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
Anything else we need to know?
No response