Skip to content

floriankrb/anemoi-benchmarks

Repository files navigation

Anemoi Benchmarks

Benchmarking read throughput of anemoi-datasets Zarr stores, comparing Zarr 2 vs Zarr 3 across different parallelisation strategies and storage backends.

Overview

The suite measures how fast data can be read from Zarr datasets using threads, processes, and PyTorch DataLoader (with DDP). It includes heat tracking to ensure benchmarks read cold (uncached) data.

See BENCHMARK.md for detailed results and methodology. See BENCHMARK_TORCH.md for more results using pytorch data loader.

Datasets

The following dataset is publicly available and can be used to reproduce the benchmarks:

The higher-resolution datasets (N320, O1280) used in some benchmarks are not yet publicly available.

Quick Start

./run_test.sh <path-to-dataset.zarr> --mode threads --workers 1-2-4-8-16 -n 16
./run_test.sh <path-to-dataset.zarr> --mode processes --workers 1-2-4-8-16 -n 16
./run_test.sh <path-to-dataset.zarr> --mode torch --workers 1-2-4-8-16 -n 16 -g 4

Modes: threads, processes, torch, or threads-processes. Results are saved as JSONL files in logs/.

Plotting

Use plot.py to visualise results. It reads the JSONL logs and produces PNG plots. Use -K to filter by dataset path and -o to set the output file:

./plot.py logs/* -K <path-to-dataset.zarr> -o results.png

Other useful options: -k for substring filtering (e.g. -k "S3 | SSD"), --torch-only, --no-torch.

Simple benchmark

Additionally some simple benchmark tools can be found in simple_benchmark/*, no datasets needed.

 ./simple_benchmark/run.sh --path /path/to/directory --chunk-size 1GB

Requirements

  • uv (dependencies are managed automatically via uv run)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published