-
Notifications
You must be signed in to change notification settings - Fork 61
Time-bounded investigation: use experimental branch of anemoi-datasets with zarr3 features #384
Copy link
Copy link
Closed
Feature
Copy link
Labels
dataAnything related to the datasets used in the projectAnything related to the datasets used in the projectinfraIssues related to infrastructureIssues related to infrastructure
Milestone
Description
Anemoi-datasets does not officially support zarr3 yet but they have a feature branch that implements the feature, see issue below. It has not been prioritized yet for compatibility issues.
The goal of this task is to report:
- can the weathergenerator depend on this special version of anemoi-datasets to:
- read the anemoi data and other core datasests such as era5 as before
- read/write inference data
- what would be acceptable parameters to reduce the number of inodes by 10x without losing performance?
- 10x is enough. The goal is NOT to find the optimal tradeoff, just to investigate if it works as expected
The goal is investigative, it should be 1-2 weeks max
ecmwf/anemoi-datasets#220
ecmwf/anemoi-datasets#290
It should mostly involve a change on zarrIO. It should still be time-bounded: the python API of zarr has changed and could require significant code changes (I hope not).
Sub tasks:
- @grassesi to look into updating the package dependencies to anemoi / zarr3 (does uv sync pass?)
- @shmh40 to try reading existing zarr2 data from anemoi (ex: era5) and one of the zarr-specific readers (ex: data_reader_obs?). check that the read performance is not regressing (+- 5% of existing training time with zarr2)
- @enssow to run inference + evaluation using an updated version of zarrio to zarr3. Using an existing model
- @enssow to check the compaction features (time bounded)
Reactions are currently unavailable
Metadata
Metadata
Labels
dataAnything related to the datasets used in the projectAnything related to the datasets used in the projectinfraIssues related to infrastructureIssues related to infrastructure
Type
Projects
Status
Done