feat: Make anemoi-datasets agnostic to Zarr version (Optional support Zarr3) by floriankrb · Pull Request #220 · ecmwf/anemoi-datasets

floriankrb · 2025-02-26T16:46:45Z

Description

Anemoi-datasets should be agnostic to the version of zarr. It should run with zarr3 installed or with zarr2 installed.

We should also take into account that zarr2 code cannot read datasets created by zarr3. So, ideally, we should
1 - update anemoi-datasets to work with zarr2 and zarr3
2 - keep using zarr2 to build datasets (dependency anemoi-datasets[create] on zarr<=2, but have zarr2 or 3 when reading (dependency of anemoi-datasets on zarr
3 - when user have updated their environment (6 months?), start building zarr3 datasets

This PR addresses the first point, making anemoi-datasets detect the version of zarr and adapt to it.
We still pin the version to zarr 2 in the pyproject.toml because of performance issues with zarr version 3.

📚 Documentation preview 📚: https://anemoi-datasets--220.org.readthedocs.build/en/220/

codecov-commenter · 2025-02-27T09:37:01Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.85%. Comparing base (80de4c6) to head (1459e83).
Report is 62 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #220      +/-   ##
==========================================
+ Coverage   72.96%   73.85%   +0.88%     
==========================================
  Files          10       10              
  Lines         825      872      +47     
==========================================
+ Hits          602      644      +42     
- Misses        223      228       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

anaprietonem · 2025-06-03T11:25:06Z

@floriankrb what was the conclusion in terms of adding/updating to zarr 3?

floriankrb · 2025-06-04T13:17:33Z

@anaprietonem I updated the description of the PR

frazane · 2025-06-25T12:29:49Z

Just adding a small comment here. You've probably considered this already but here we go.

The transition to Zarr v3 specification comes with a nice opportunity: we could use the sharding feature! It would allow us to have chunked variables (avoiding having to load them all in memory before taking a subset) while still retaining very high read speeds. Basically we would make each timestamp a shard, and then chunk variables inside the shard. The downside is slower write speed, but it’s not very important.

More info:

Some quick benchmarking of the sharded format zarr-developers/zarr-python#1338

b8raoult · 2025-06-26T10:26:35Z

All tests are now passing with zarr2 and zarr3. At the moment of writing, zarr3 still does not support datetime64 (zarr-developers/zarr-python#2616)

Tests are much more slower with zarr3, this will require more investigation. For example test_slice_4 takes 98s with zarr2 (1.5 minutes) and 612s with zarr3 (10 minutes)

The profiler shows for zarr2:

   773147    0.339    0.000   95.489    0.000 .../anemoi-datasets/src/anemoi/datasets/data/stores.py:148(__getitem__)
   773165    1.553    0.000   95.152    0.000 .../python3.12/site-packages/zarr/core.py:656(__getitem__)

and for zarr3:

   773147    0.449    0.000  615.551    0.001 .../anemoi-datasets/src/anemoi/datasets/data/stores.py:148(__getitem__)
   773165    2.088    0.000  615.118    0.001 .../python3.12/site-packages/zarr/core/array.py:2294(__getitem__)

The test uses an in-memory zarr store. The test calls __getitem__ 770k time. With a time per call of 0.001s, we have 0.001 x 770000 is 770, which is in the ballpark of 615.

tjhunter

@floriankrb @b8raoult this is fantastic, thank you for the hard work! We will try it this week or next and get back to you.

src/anemoi/datasets/zarr_versions/zarr2.py

src/anemoi/datasets/zarr_versions/zarr3.py

src/anemoi/datasets/create/__init__.py

TomNicholas · 2026-02-05T22:41:22Z

Do you guys need any help with this from the Zarr side? We have a lot of people wanting to use Anemoi with Zarr v3 (including the ones in #290).

We should also take into account that zarr2 code cannot read datasets created by zarr3.
So, ideally, we should
1 - update anemoi-datasets to work with zarr2 and zarr3

I'm not sure I understand why this is necessary, as zarr-python v3 can read and write zarr format version 2 and 3 data if you want. (We do this in xarray for example - you can use xarray with zarr-python v3 to read zarr format 2 data.)

At the moment of writing, zarr3 still does not support datetime64

It does now!

Tests are much more slower with zarr3, this will require more investigation.

This was investigated in zarr-developers/zarr-python#3524 (comment) and seemed to satisfy you @b8raoult - is it still a blocker?

Also bear in mind that zarr-python 3 has been out for quite a while now - long enough in fact that xarray is seriously considering dropping support for zarr-python v2 (in accordance with SPEC0). Don't get left behind! 🙂

anaprietonem · 2026-02-06T07:37:03Z

Hey @TomNicholas, thanks for the interest! Indeed, we’re aware of the growing interest in Zarr v3, and that’s what this PR is aiming to address.

Where we are at the moment is that some changes are needed to enable Zarr v3 support in anemoi-datasets, those included in this PR, but also things like dropping support for Python 3.10. We already have many datasets built with Zarr v2, and it’s important for us to benchmark runtime performance for training workloads. We’ve done some analysis here: #486, feel free to take a look and share your thoughts. Our tests on HPC systems show some degradation in runtime performance when moving to the Zarr v3 Python package.

We’re still working on gathering more data points and exploring ways to improve this. This was discussed with @ecmwf/anemoi_technical_subgroup, and the decision was to merge Zarr v3 support into main in anemoi-datasets, but not yet remove the pyproject constraint on Zarr v2. This way, users who want to experiment with Zarr v3 can easily force it, while anemoi-datasets officially supports it. If we find ways to further optimise performance, we can simply update the pyproject constraints in the future.

@floriankrb @b8raoult @cathalobrien feel free to add to this!

b8raoult · 2026-02-07T08:55:57Z

To complement @anaprietonem's message. We have a very thin compatibility wrapper around zarr2 and zarr3 in our code, for apis that have changed. This wrapper will use whatever version of zarr is installed. By default we mention version 2 in the dependencies, but a user can force a pip install of version 3, and the code will still work, with the caveat that zarr2 cannot read zarr3 datasets. Once we fully adopt zarr3, we will remove that wrapper and call the zarr api directly.

b8raoult · 2026-02-07T09:07:42Z

@TomNicholas, as you mention it in your comment above, is the issue of indexing speed for memory only zarrs described in zarr-developers/zarr-python#3524 fixed? The ticket is still open and I have not tested again.

TomNicholas · 2026-02-11T18:33:59Z

I don't think anyone has worked on that yet, no.

But is that a blocker for you? Your response here

You may close this issue if you wish.

implied that it is not a blocker. But if it is a blocker then we would like to help unblock you.

for more information, see https://pre-commit.ci

anaprietonem

This PR LGTM to get it in to facilitate the transition to zarr3.

floriankrb · 2026-03-31T14:29:41Z

Downstream CI is failing due to a timeout on data.ecmwf.int, tests run separately work. Merging.

github-actions bot added tests enhancement New feature or request labels Feb 26, 2025

floriankrb force-pushed the feature/zarr3 branch from 8371f18 to 07edf73 Compare February 27, 2025 09:30

github-actions bot added the dependencies Pull requests that update a dependency file label Feb 27, 2025

floriankrb force-pushed the feature/zarr3 branch from 7c0f187 to 1459e83 Compare February 27, 2025 10:21

anaprietonem assigned floriankrb Feb 27, 2025

floriankrb marked this pull request as ready for review March 12, 2025 15:59

floriankrb requested review from HCookie, JPXKQX, JesperDramsch, anaprietonem, b8raoult, gmertes, mchantry and theissenhelen as code owners March 12, 2025 15:59

floriankrb marked this pull request as draft March 28, 2025 09:17

floriankrb mentioned this pull request Jun 23, 2025

Support for zarr 3 #290

Open

b8raoult marked this pull request as ready for review June 24, 2025 11:43

b8raoult requested a review from a team as a code owner June 24, 2025 11:43

tjhunter mentioned this pull request Jun 24, 2025

Time-bounded investigation: use experimental branch of anemoi-datasets with zarr3 features ecmwf/WeatherGenerator#384

Closed

4 tasks

github-actions bot added the documentation Improvements or additions to documentation label Jun 26, 2025

tjhunter reviewed Jun 27, 2025

View reviewed changes

src/anemoi/datasets/create/__init__.py Outdated Show resolved Hide resolved

grassesi mentioned this pull request Jul 3, 2025

Limit inode consumption of output data ecmwf/WeatherGenerator#394

Closed

mchantry moved this from Now In Progress to On Pause in Anemoi-dev Sep 17, 2025

frazane mentioned this pull request Oct 21, 2025

Cannot read datasets from EWC S3 #453

Closed

floriankrb force-pushed the feature/zarr3 branch from cd9231b to c95780e Compare October 29, 2025 16:10

anaprietonem mentioned this pull request Nov 27, 2025

Adoption of Zarr3 #485

Open

floriankrb force-pushed the feature/zarr3 branch from 49ef0e2 to 72ae08f Compare January 26, 2026 16:27

anaprietonem changed the title ~~feat: zarr3~~ feat: Make anemoi-datasets agnostic to Zarr version Feb 5, 2026

anaprietonem changed the title ~~feat: Make anemoi-datasets agnostic to Zarr version~~ feat: Make anemoi-datasets agnostic to Zarr version (Optional support Zarr3) Feb 5, 2026

floriankrb added 2 commits March 12, 2026 15:07

support also zarr3

b5b6f32

hack to make the test faster

1bdcc54

floriankrb force-pushed the feature/zarr3 branch from eda900d to 1bdcc54 Compare March 12, 2026 15:11

fix checks

1af2fe0

floriankrb mentioned this pull request Mar 13, 2026

Switch to zarr 3 as a dependency of anemoi-datasets #573

Open

floriankrb and others added 5 commits March 13, 2026 13:49

Merge branch 'main' into feature/zarr3

5a00c82

Merge branch 'main' into feature/zarr3

6f421aa

Merge branch 'main' into feature/zarr3

4ba6ffe

[pre-commit.ci] auto fixes from pre-commit.com hooks

c534151

for more information, see https://pre-commit.ci

Merge branch 'main' into feature/zarr3

082903d

anaprietonem approved these changes Mar 31, 2026

View reviewed changes

anaprietonem added the ATS Approval not needed label Mar 31, 2026

floriankrb merged commit ab8cd71 into main Mar 31, 2026
362 of 374 checks passed

floriankrb deleted the feature/zarr3 branch March 31, 2026 14:29

github-project-automation bot moved this from On Pause to Done in Anemoi-dev Mar 31, 2026

DeployDuck mentioned this pull request Mar 31, 2026

chore(main): Release 0.5.36 #568

Draft

Conversation

floriankrb commented Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

codecov-commenter commented Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

anaprietonem commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

floriankrb commented Jun 4, 2025

Uh oh!

frazane commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

b8raoult commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tjhunter left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TomNicholas commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anaprietonem commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

b8raoult commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

b8raoult commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomNicholas commented Feb 11, 2026

Uh oh!

anaprietonem left a comment

Choose a reason for hiding this comment

Uh oh!

floriankrb commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

floriankrb commented Feb 26, 2025 •

edited

Loading

codecov-commenter commented Feb 27, 2025 •

edited

Loading

anaprietonem commented Jun 3, 2025 •

edited

Loading

frazane commented Jun 25, 2025 •

edited

Loading

b8raoult commented Jun 26, 2025 •

edited

Loading

TomNicholas commented Feb 5, 2026 •

edited

Loading

anaprietonem commented Feb 6, 2026 •

edited

Loading

b8raoult commented Feb 7, 2026 •

edited

Loading

b8raoult commented Feb 7, 2026 •

edited

Loading