Skip to content

Conversation

@mahf708
Copy link
Collaborator

@mahf708 mahf708 commented Nov 15, 2025

@odiazib's PR currently under review in ai2cm, opened here for public review

--

Short description of why the PR is needed and how it satisfies those requirements, in sentence form.

Changes:

  • symbol (e.g. fme.core.my_function) or script and concise description of changes or added feature

  • Can group multiple related symbols on a single bullet

  • Tests added

Resolves # (delete if none)

odiazib and others added 30 commits November 14, 2025 22:06
…run ACE using PhysicsNemo. It works, but it does not utilize spatial parallelism yet.
… unit test that divides the dataset into four parts, subsequently comparing the results with the original dataset.
…s implementation using unit tests based on those developed by Makani.
…g with spatial parallelism. The unit tests ran, but I have not checked for correctness.
- Ensure the distribute class, which produces a global singleton, is initialized only once.

- Set spatial parallelism parameters (i.e., h and w) as environmental variables.

- Emphasize the necessity of saving and loading checkpoints.

- Allow part of the save_checkpoint routine to be executed by all processors for spatial parallelism.
Co-authored-by: Jeremy McGibbon <[email protected]>
…training slower by 10 seconds for each epoch.
@mahf708 mahf708 changed the title e3sm/oscar/spatial parallelism e3sm/oscar/spatial-parallelism Nov 15, 2025
odiazib and others added 11 commits November 17, 2025 07:32
…Thus, we must add logic to handle this case. I also moved this part of the code that reshapes the dataset to the distribution class.
…ism version of the annual aggregator is not working.
… the correct batch size when using spatial parallelism. This fix improves the loss computation, but it will decrease the number of trained samples per second. Previously, we were not loading the dataset correctly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants