Skip to content

Conversation

@HCookie
Copy link
Member

@HCookie HCookie commented Oct 28, 2025

Description

Add cluster environments to delegate concern on parallel inference.

Backwards compatible with slurm running, as automatic detection of the cluster is implemented,

Allows for easier extension to additional clusters, with support now already for slurm, mpi, torchrun, distributed and manual invocation.

Manual Cluster

runner:
   parallel:
       cluster:
           manual: 4

Automatic

runner: parallel

Tests

  • torchrun
  • slurm
  • mpi
  • manual
  • AzureML

What problem does this change solve?

Extends parallel running to allow for easy implementation of exotic clusters

As a contributor to the Anemoi framework, please ensure that your changes include unit tests, updates to any affected dependencies and documentation, and have been tested in a parallel setting (i.e., with multiple GPUs). As a reviewer, you are also responsible for verifying these aspects and requesting changes if they are not adequately addressed. For guidelines about those please refer to https://anemoi.readthedocs.io/en/latest/

By opening this pull request, I affirm that all authors agree to the Contributor License Agreement.


📚 Documentation preview 📚: https://anemoi-inference--365.org.readthedocs.build/en/365/

Use delegation to seperate concerns
@HCookie HCookie self-assigned this Oct 28, 2025
@github-project-automation github-project-automation bot moved this to To be triaged in Anemoi-dev Oct 28, 2025
@github-actions github-actions bot added documentation Improvements or additions to documentation config tests enhancement New feature or request and removed documentation Improvements or additions to documentation config tests labels Oct 28, 2025
@github-actions github-actions bot added documentation Improvements or additions to documentation config tests labels Oct 28, 2025
@HCookie HCookie moved this from To be triaged to Reviewers needed in Anemoi-dev Oct 28, 2025
@HCookie HCookie requested a review from tmi October 28, 2025 15:00
@HCookie HCookie marked this pull request as ready for review November 4, 2025 14:57
@HCookie HCookie moved this from Reviewers needed to Under Review in Anemoi-dev Nov 4, 2025
@HCookie HCookie changed the title feat(parallel runner): Add cluster environments feat(parallel runner)!: Add cluster environments Nov 6, 2025
@HCookie HCookie added the ATS Approved Approved by ATS label Nov 6, 2025
@HCookie HCookie force-pushed the feat-cluster-environment branch from df354cb to c9c7f35 Compare November 7, 2025 11:36
@HCookie HCookie requested a review from gmertes November 18, 2025 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ATS Approval needed ATS Approved Approved by ATS config documentation Improvements or additions to documentation enhancement New feature or request tests

Projects

Status: Under Review

Development

Successfully merging this pull request may close these issues.

5 participants