Commit 06e5533
fix(models): processor chunking (#629)
## Description
Remove the separate ProcessorChunk class and flatten all layers directly
into the BaseProcessor.
Chunking is now handled dynamically at runtime by grouping layers into
checkpointed segments.
## What problem does this change solve?
Previously, the Processor class held a list of ProcessorChunks which
held its own ModuleList of layers, meaning that checkpointed layer
groupings were tied to the chunking configuration saved in the model
checkpoint.
When resuming training with a different num_chunks, the restored module
structure no longer matched the saved one, causing checkpoint
mismatches. Now we only have one flat list of all layers (Blocks) in the
Processor Class and chunking is handled dynamically.
## What issue or task does this change relate to?
<!-- link to Issue Number -->
## Additional notes ##
Tested with all models, i.e. GT, Transformer, GNN, PointWiseMLP
***As a contributor to the Anemoi framework, please ensure that your
changes include unit tests, updates to any affected dependencies and
documentation, and have been tested in a parallel setting (i.e., with
multiple GPUs). As a reviewer, you are also responsible for verifying
these aspects and requesting changes if they are not adequately
addressed. For guidelines about those please refer to
https://anemoi.readthedocs.io/en/latest/***
By opening this pull request, I affirm that all authors agree to the
[Contributor License
Agreement.](https://github.com/ecmwf/codex/blob/main/Legal/contributor_license_agreement.md)
<!-- readthedocs-preview anemoi-training start -->
----
📚 Documentation preview 📚:
https://anemoi-training--629.org.readthedocs.build/en/629/
<!-- readthedocs-preview anemoi-training end -->
<!-- readthedocs-preview anemoi-graphs start -->
----
📚 Documentation preview 📚:
https://anemoi-graphs--629.org.readthedocs.build/en/629/
<!-- readthedocs-preview anemoi-graphs end -->
<!-- readthedocs-preview anemoi-models start -->
----
📚 Documentation preview 📚:
https://anemoi-models--629.org.readthedocs.build/en/629/
<!-- readthedocs-preview anemoi-models end -->
---------
Co-authored-by: Simon Lang <[email protected]>
Co-authored-by: gabrieloks <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Ana Prieto Nemesio <[email protected]>
Co-authored-by: Jakob Schloer <[email protected]>1 parent 6819be1 commit 06e5533
File tree
18 files changed
+209
-608
lines changed- models
- docs/introduction
- src/anemoi/models
- layers
- migrations/scripts
- tests/layers
- block
- chunk
- processor
- training/src/anemoi/training
- config/model
- utils
18 files changed
+209
-608
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
99 | 99 | | |
100 | 100 | | |
101 | 101 | | |
102 | | - | |
103 | | - | |
104 | | - | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
105 | 106 | | |
106 | 107 | | |
107 | 108 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
90 | 90 | | |
91 | 91 | | |
92 | 92 | | |
93 | | - | |
94 | | - | |
| 93 | + | |
| 94 | + | |
95 | 95 | | |
96 | 96 | | |
97 | 97 | | |
| |||
146 | 146 | | |
147 | 147 | | |
148 | 148 | | |
149 | | - | |
| 149 | + | |
150 | 150 | | |
151 | 151 | | |
152 | 152 | | |
| |||
160 | 160 | | |
161 | 161 | | |
162 | 162 | | |
163 | | - | |
| 163 | + | |
164 | 164 | | |
165 | 165 | | |
166 | 166 | | |
| |||
222 | 222 | | |
223 | 223 | | |
224 | 224 | | |
225 | | - | |
| 225 | + | |
226 | 226 | | |
227 | 227 | | |
228 | 228 | | |
| |||
242 | 242 | | |
243 | 243 | | |
244 | 244 | | |
| 245 | + | |
245 | 246 | | |
246 | 247 | | |
247 | 248 | | |
| |||
264 | 265 | | |
265 | 266 | | |
266 | 267 | | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
267 | 279 | | |
268 | 280 | | |
269 | 281 | | |
| |||
306 | 318 | | |
307 | 319 | | |
308 | 320 | | |
| 321 | + | |
| 322 | + | |
309 | 323 | | |
310 | 324 | | |
311 | 325 | | |
| |||
424 | 438 | | |
425 | 439 | | |
426 | 440 | | |
427 | | - | |
428 | 441 | | |
429 | 442 | | |
430 | 443 | | |
| |||
442 | 455 | | |
443 | 456 | | |
444 | 457 | | |
445 | | - | |
446 | | - | |
447 | 458 | | |
448 | 459 | | |
449 | 460 | | |
| |||
463 | 474 | | |
464 | 475 | | |
465 | 476 | | |
466 | | - | |
467 | 477 | | |
468 | 478 | | |
469 | 479 | | |
| |||
662 | 672 | | |
663 | 673 | | |
664 | 674 | | |
665 | | - | |
666 | 675 | | |
667 | 676 | | |
668 | 677 | | |
| |||
777 | 786 | | |
778 | 787 | | |
779 | 788 | | |
780 | | - | |
781 | 789 | | |
782 | 790 | | |
783 | 791 | | |
| |||
795 | 803 | | |
796 | 804 | | |
797 | 805 | | |
798 | | - | |
799 | | - | |
800 | 806 | | |
801 | 807 | | |
802 | 808 | | |
| |||
819 | 825 | | |
820 | 826 | | |
821 | 827 | | |
822 | | - | |
823 | 828 | | |
824 | 829 | | |
825 | 830 | | |
| |||
851 | 856 | | |
852 | 857 | | |
853 | 858 | | |
854 | | - | |
| 859 | + | |
| 860 | + | |
855 | 861 | | |
856 | 862 | | |
857 | 863 | | |
| |||
0 commit comments