forked from ai2cm/ace
-
Notifications
You must be signed in to change notification settings - Fork 0
e3sm/oscar/spatial-parallelism #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
mahf708
wants to merge
46
commits into
e3sm/main
Choose a base branch
from
e3sm/oscar/spatial-parallelism
base: e3sm/main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…run ACE using PhysicsNemo. It works, but it does not utilize spatial parallelism yet.
… unit test that divides the dataset into four parts, subsequently comparing the results with the original dataset.
…s implementation using unit tests based on those developed by Makani.
…g with spatial parallelism. The unit tests ran, but I have not checked for correctness.
- Ensure the distribute class, which produces a global singleton, is initialized only once. - Set spatial parallelism parameters (i.e., h and w) as environmental variables. - Emphasize the necessity of saving and loading checkpoints. - Allow part of the save_checkpoint routine to be executed by all processors for spatial parallelism.
Co-authored-by: Jeremy McGibbon <[email protected]>
…training slower by 10 seconds for each epoch.
…Thus, we must add logic to handle this case. I also moved this part of the code that reshapes the dataset to the distribution class.
…ism version of the annual aggregator is not working.
… the correct batch size when using spatial parallelism. This fix improves the loss computation, but it will decrease the number of trained samples per second. Previously, we were not loading the dataset correctly.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
@odiazib's PR currently under review in ai2cm, opened here for public review
--
Short description of why the PR is needed and how it satisfies those requirements, in sentence form.
Changes:
symbol (e.g.
fme.core.my_function) or script and concise description of changes or added featureCan group multiple related symbols on a single bullet
Tests added
Resolves # (delete if none)