Support zipped zarr

**Is your feature request related to a problem? Please describe.**
We, at Ouranos, have an internal database of many datasets in netCDF and in Zarr. The database is (partially) duplicated between our internal server (HPC-like but not quite) and an externally shared HPC. This other machine has a strict quota on inodes (10M over 2PB, which means an average file size of 215 Mo).  

Thus, conventional Zarr folders are a no-go because they are composed of numerous really small files. A while ago, we decided to zip all of them. The impact on reading speed is insignificant compared to other productivity gains (not needing netCDF4 for example).

However, Zarr 3 has dropped the transparent support of zipped directory. One can't do `xr.open_zarr('path/to/dataset.zarr.zip')` anymore. One has to create a `zarr.ZipStore` and pass that to `xarray`. 

Zarr's upcoming "url pipeline" (ZEP 8) fixes this in a way, but it has been "upcoming" for a while now. And the addition of  `kerchunk` to `intake-esm`'s dependencies has implicitly pinned zarr to >3.

**Describe the solution you'd like**
Somewhere in `source.py::_open_dataset` I think we could detect that the path is a zipped zarr and act accordingly. For example is the url endswith `.zip` and  `format == 'zarr'`.

**Describe alternatives you've considered**
Not updating `intake-esm` nor `zarr`. But that's not long-term.

**Additional context**
As a side questions, if any other data managers are reading this : have you had this inode issue before ? How did you solve it ?

I'd rather fix this in `intake-esm` or `zarr` before converting our full database back to netCDFs or an other format.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support zipped zarr #769

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support zipped zarr #769

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions