Skip to content

Time-bounded investigation: use experimental branch of anemoi-datasets with zarr3 features #384

@tjhunter

Description

@tjhunter

Anemoi-datasets does not officially support zarr3 yet but they have a feature branch that implements the feature, see issue below. It has not been prioritized yet for compatibility issues.

The goal of this task is to report:

  1. can the weathergenerator depend on this special version of anemoi-datasets to:
  • read the anemoi data and other core datasests such as era5 as before
  • read/write inference data
  1. what would be acceptable parameters to reduce the number of inodes by 10x without losing performance?
  • 10x is enough. The goal is NOT to find the optimal tradeoff, just to investigate if it works as expected

The goal is investigative, it should be 1-2 weeks max

ecmwf/anemoi-datasets#220
ecmwf/anemoi-datasets#290

It should mostly involve a change on zarrIO. It should still be time-bounded: the python API of zarr has changed and could require significant code changes (I hope not).

Sub tasks:

  • @grassesi to look into updating the package dependencies to anemoi / zarr3 (does uv sync pass?)
  • @shmh40 to try reading existing zarr2 data from anemoi (ex: era5) and one of the zarr-specific readers (ex: data_reader_obs?). check that the read performance is not regressing (+- 5% of existing training time with zarr2)
  • @enssow to run inference + evaluation using an updated version of zarrio to zarr3. Using an existing model
  • @enssow to check the compaction features (time bounded)

Metadata

Metadata

Labels

dataAnything related to the datasets used in the projectinfraIssues related to infrastructure

Projects

Status

Done

Relationships

None yet

Development

No branches or pull requests

Issue actions