refactor(optimizer): finalize optimizer schema and backend handling #5157

OutisLi · 2026-01-16T15:48:08Z

Summary by CodeRabbit

New Features
- Centralized, configurable "optimizer" section for training configs with per-model overrides for multi-task runs.
- Expanded optimizer choices and explicit hyperparameters (beta1, beta2, weight_decay); improved scheduler/warm‑up behavior on supported backends.
Documentation
- Added optimizer docs, examples, and framework-specific availability notes.
Tests
- Updated tests and examples to use the new optimizer block.
Chores
- Added backward-compatibility conversion for legacy optimizer config formats.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Copilot

Pull request overview

This pull request refactors the optimizer configuration schema to promote it from being nested within the training section to a dedicated top-level optimizer section in the input JSON configuration. This change improves consistency and clarity across the DeePMD-kit codebase.

Changes:

Introduced a new top-level optimizer section in the configuration schema with support for multiple optimizer types (Adam, AdamW, LKF, AdaMuon, HybridMuon)
Updated all backend implementations (TensorFlow, PyTorch, Paddle) to extract optimizer parameters from the new location
Updated documentation to reference the new optimizer section across multiple training guides

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`doc/train/training-advanced.md`	Added comprehensive documentation for the new `optimizer` section with examples and backend-specific support information
`doc/model/train-fitting-tensor.md`	Updated to reference the new `optimizer` section in the configuration structure
`doc/model/train-fitting-property.md`	Updated to reference the new `optimizer` section in the configuration structure
`doc/model/train-fitting-dos.md`	Updated to reference the new `optimizer` section in the configuration structure
`deepmd/utils/argcheck.py`	Defined optimizer schema with ArgsPlugin pattern, moved optimizer definitions from `training_args` to standalone `optimizer_args`, added support for all optimizer types with proper defaults
`deepmd/tf/train/trainer.py`	Updated to extract optimizer parameters from new `optimizer` section, added early validation for TensorFlow-unsupported features (weight_decay, non-Adam optimizers), implemented beta1/beta2 parameters
`deepmd/pt/train/training.py`	Updated to extract optimizer parameters from new `optimizer` section, refactored `get_opt_param` to include optimizer-specific default values, expanded optimizer initialization with explicit beta parameters
`deepmd/pd/train/training.py`	Updated to extract optimizer parameters from new `optimizer` section, added beta1/beta2/weight_decay parameters to Adam optimizer initialization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

deepmd/pd/train/training.py

coderabbitai · 2026-01-16T15:58:06Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Centralizes optimizer configuration under a top-level optimizer section; parses and validates optimizer type and hyperparameters (adam_beta1, adam_beta2, weight_decay); propagates changes to PyTorch, Paddle, and TensorFlow training code, argument parsing, docs, tests, and adds a converter for backward-compatibility.

Changes

Cohort / File(s)	Summary
PyTorch training `deepmd/pt/train/training.py`	Read global `optimizer` dict (fallback to {}); use `type` key (Adam/AdamW/LKF/AdaMuon/HybridMuon); apply per-type defaults; use global or per-model opt params depending on multitask; load optimizer state on restart; attach LambdaLR warm-up scheduler.
Paddle training `deepmd/pd/train/training.py`	Read `optimizer` params from config; use `type` key; propagate adam_beta1/adam_beta2/weight_decay into `paddle.optimizer.Adam` initialization; retain multi-task per-model optim_dict when present.
TensorFlow training `deepmd/tf/train/trainer.py`	Parse `jdata["optimizer"]` and store optimizer_type, optimizer_beta1/2, optimizer_weight_decay; enforce `type == "Adam"` and weight_decay==0.0; build Adam optimizer with explicit beta1/beta2 in all branches.
Compatibility utilities `deepmd/utils/compat.py`, `deepmd/tf/utils/compat.py`	Add `convert_optimizer_to_new_format(jdata, warning=True)` to migrate legacy `training.opt_type`/kf_* keys into top-level `optimizer` block; expose via tf compat wrapper.
Entrypoints `deepmd/pd/entrypoints/main.py`, `deepmd/pt/entrypoints/main.py`, `deepmd/tf/entrypoints/train.py`	Invoke `convert_optimizer_to_new_format` after `update_deepmd_input` to migrate legacy optimizer configs during train startup.
Argument validation / CLI `deepmd/utils/argcheck.py`	Add optimizer argument plugin and registrars for Adam, AdamW, LKF, AdaMuon, HybridMuon; introduce `optimizer_variant_type_args()` and `optimizer_args()`; include `optimizer` in generated args for single- and multi-task flows.
Docs `doc/train/training-advanced.md`, `doc/model/train-fitting-*.md`	Add Optimizer subsection, example JSON, list supported optimizers per framework, document adam_beta1/adam_beta2/weight_decay and TensorFlow constraint on weight decay; add `{optimizer}` to training JSON docs.
Tests / Examples `source/tests/pt/model/water/lkf.json`, `source/tests/pd/test_training.py`, `source/tests/pt/test_training.py`	Move optimizer fields to top-level `optimizer` object in fixtures; call `convert_optimizer_to_new_format(..., warning=False)` in tests to normalize legacy configs.
Misc (TF compat export) `deepmd/tf/utils/compat.py`	Re-export convert function in TF compat module `__all__`.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant User as User/config
  participant Compat as compat.convert_optimizer_to_new_format
  participant Entryp as Entrypoint/train()
  participant Trainer as Trainer (PT/PD/TF)
  participant Optim as Optimizer factory
  participant Sched as LR Scheduler

  User->>Entryp: provide input JSON
  Entryp->>Compat: convert_optimizer_to_new_format(jdata)
  Compat-->>Entryp: normalized jdata with top-level optimizer
  Entryp->>Trainer: start training with normalized config
  Trainer->>Optim: read optimizer.type and params
  Optim-->>Trainer: instantiate optimizer (Adam/AdamW/LKF/AdaMuon/HybridMuon)
  Trainer->>Sched: create LambdaLR warm-up (when applicable)
  Sched-->>Trainer: return scheduler attached to optimizer
  Trainer-->>Entryp: training loop proceeds using optimizer+scheduler

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat(pt): add AdamW for pt training #4757 — Adds AdamW support and weight_decay handling in PyTorch training; overlaps on optimizer-type and weight-decay logic.

Suggested reviewers

njzjz
wanghan-iapcm

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 13.64% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: refactoring the optimizer configuration schema and how it's handled across different backends (TensorFlow, PyTorch, Paddle).

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@doc/model/train-fitting-dos.md`:
- Line 19: The README uses non-descriptive link text "here"; change it to a
descriptive phrase and update the inline reference so readers know what they’re
clicking. Replace "here" with something like "training parameters reference" or
"training keyword definitions" while keeping the same target file
(train-se-e2-a.md), and ensure the sentence still mentions the relevant keywords
{ref}`model[standard]/fitting_net <model[standard]/fitting_net>` and {ref}`loss
<loss>` so readers can find the training parameter explanations for fitting the
dos.

In `@doc/model/train-fitting-property.md`:
- Line 17: Replace the ambiguous link text "here" with a descriptive phrase so
the sentence reads clearly and improves accessibility; specifically update the
sentence referencing train-se-atten.md to use descriptive link text such as "SE
attention training documentation" (or similar) instead of "here", keeping the
surrounding refs ({ref}`model`, {ref}`learning_rate`, {ref}`optimizer`,
{ref}`loss`, {ref}`training`) and the callouts to
{ref}`model[standard]/fitting_net` and {ref}`loss` intact.

In `@doc/model/train-fitting-tensor.md`:
- Line 33: Replace the non-descriptive link text "here" in the sentence
describing training JSON (the line referencing `input.json`, `ener` mode and the
sections {ref}`model <model>`, {ref}`learning_rate <learning_rate>`,
{ref}`optimizer <optimizer>`, {ref}`loss <loss>` and {ref}`training <training>`)
with a descriptive link label that reflects the target content (e.g., "training
keywords and meanings" or "training keyword reference") and point it to
train-se-e2-a.md so the link text clearly conveys what the reader will find.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8b0c3b0 and c1d615b.

📒 Files selected for processing (8)

deepmd/pd/train/training.py
deepmd/pt/train/training.py
deepmd/tf/train/trainer.py
deepmd/utils/argcheck.py
doc/model/train-fitting-dos.md
doc/model/train-fitting-property.md
doc/model/train-fitting-tensor.md
doc/train/training-advanced.md

🧰 Additional context used

🧠 Learnings (3)

📚 Learning: 2025-12-12T13:40:14.334Z

Learnt from: CR
Repo: deepmodeling/deepmd-kit PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-12T13:40:14.334Z
Learning: Test PyTorch training with `cd examples/water/se_e2_a && dp --pt train input_torch.json --skip-neighbor-stat` and allow timeout of 60+ seconds for validation

Applied to files:

doc/model/train-fitting-tensor.md

📚 Learning: 2025-12-12T13:40:14.334Z

Learnt from: CR
Repo: deepmodeling/deepmd-kit PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-12T13:40:14.334Z
Learning: Test TensorFlow training with `cd examples/water/se_e2_a && dp train input.json --skip-neighbor-stat` and allow timeout of 60+ seconds for validation

Applied to files:

doc/model/train-fitting-tensor.md

📚 Learning: 2026-01-10T04:28:18.703Z

Learnt from: OutisLi
Repo: deepmodeling/deepmd-kit PR: 5130
File: deepmd/pt/optimizer/adamuon.py:40-109
Timestamp: 2026-01-10T04:28:18.703Z
Learning: The coefficients (3.4445, -4.7750, 2.0315) in the `zeropower_via_newtonschulz5` function in `deepmd/pt/optimizer/adamuon.py` are standard across AdaMuon implementations and should not be modified for consistency with the reference implementations.

Applied to files:

deepmd/pt/train/training.py

🧬 Code graph analysis (2)

deepmd/utils/argcheck.py (1)

deepmd/utils/plugin.py (2)

register (41-59)

register (122-141)

deepmd/pd/train/training.py (1)

deepmd/pt/train/training.py (1)

get_opt_param (160-197)

🪛 markdownlint-cli2 (0.18.1)

doc/model/train-fitting-property.md

17-17: Link text should be descriptive

(MD059, descriptive-link-text)

🪛 Ruff (0.14.11)

deepmd/pt/train/training.py

178-178: Avoid specifying long messages outside the exception class

(TRY003)

deepmd/tf/train/trainer.py

130-132: Avoid specifying long messages outside the exception class

(TRY003)

134-137: Avoid specifying long messages outside the exception class

(TRY003)

325-327: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (41)

GitHub Check: Agent
GitHub Check: Build C++ (cpu, cpu)
GitHub Check: Build C++ (clang, clang)
GitHub Check: Build C++ (cuda120, cuda)
GitHub Check: Build C++ (rocm, rocm)
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Build wheels for cp310-manylinux_aarch64
GitHub Check: Test C++ (true, false, false, true)
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build wheels for cp311-macosx_arm64
GitHub Check: Test C++ (true, true, true, false)
GitHub Check: Test Python (8, 3.13)
GitHub Check: Test C++ (false, false, false, true)
GitHub Check: Test C++ (false, true, true, false)
GitHub Check: Test Python (11, 3.10)
GitHub Check: Test Python (11, 3.13)
GitHub Check: Test Python (10, 3.10)
GitHub Check: Test Python (9, 3.13)
GitHub Check: Test Python (10, 3.13)
GitHub Check: Test Python (8, 3.10)
GitHub Check: Test Python (3, 3.13)
GitHub Check: Test Python (9, 3.10)
GitHub Check: Test Python (4, 3.10)
GitHub Check: Test Python (7, 3.13)
GitHub Check: Test Python (6, 3.13)
GitHub Check: Test Python (12, 3.10)
GitHub Check: Test Python (12, 3.13)
GitHub Check: Test Python (1, 3.13)
GitHub Check: Test Python (3, 3.10)
GitHub Check: Test Python (6, 3.10)
GitHub Check: Test Python (5, 3.13)
GitHub Check: Test Python (7, 3.10)
GitHub Check: Test Python (4, 3.13)
GitHub Check: Test Python (2, 3.13)
GitHub Check: Test Python (2, 3.10)
GitHub Check: Test Python (5, 3.10)
GitHub Check: Test Python (1, 3.10)
GitHub Check: Analyze (python)
GitHub Check: Analyze (c-cpp)
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)

🔇 Additional comments (12)

deepmd/pt/train/training.py (4)

126-129: Verify legacy optimizer keys are handled.

The trainer now reads only the top-level optimizer section when optim_dict is absent. If existing configs still set training.opt_type / weight_decay / adam_beta2, those values will be ignored. Please confirm this is an intentional breaking change or add a fallback/deprecation error to make it explicit.

Also applies to: 319-320

160-197: LGTM: per-optimizer defaults centralized.

Nice consolidation of default hyperparameters and validation in get_opt_param.

775-793: Nice: AdaMuon now restores optimizer state + scheduler.

Keeps resume behavior consistent with Adam/AdamW.

733-764: No changes needed — fused parameter is guaranteed in all supported PyTorch versions.

The project requires PyTorch ≥2.7 (GPU) or ≥2.8 (CPU) per pyproject.toml. Both torch.optim.Adam (fused since 1.13) and torch.optim.AdamW (fused since 2.0) fully support the fused parameter across all pinned versions. The current implementation is correct and doesn't require guards.

Likely an incorrect or invalid review comment.

deepmd/utils/argcheck.py (3)

2568-2852: LGTM: optimizer schema centralized under optimizer.

Clear variant-based definitions and docstrings.

3683-3684: No review comment needed.

3754-3768: LGTM: optimizer args wired into gen_args for single/multi-task.

deepmd/pd/train/training.py (2)

116-163: LGTM: optimizer params now sourced from the top-level config.

Parsing the optimizer block once keeps the trainer aligned with the new schema.

Also applies to: 250-266

595-606: LGTM: optimizer hyperparams are now plumbed through.

deepmd/tf/train/trainer.py (2)

123-137: Optimizer parsing/validation looks solid.

323-348: Optimizer construction is consistent across branches.

doc/train/training-advanced.md (1)

48-63: Optimizer section is clear and well-scoped.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

doc/model/train-fitting-dos.md

doc/model/train-fitting-property.md

doc/model/train-fitting-tensor.md

codecov · 2026-01-16T16:21:45Z

Codecov Report

❌ Patch coverage is 75.29412% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.92%. Comparing base (8b0c3b0) to head (977db22).

Files with missing lines	Patch %	Lines
deepmd/pt/train/training.py	51.51%	16 Missing ⚠️
deepmd/tf/train/trainer.py	69.23%	4 Missing ⚠️
deepmd/pd/train/training.py	80.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5157      +/-   ##
==========================================
- Coverage   81.92%   81.92%   -0.01%     
==========================================
  Files         714      714              
  Lines       73301    73371      +70     
  Branches     3616     3616              
==========================================
+ Hits        60055    60106      +51     
- Misses      12083    12103      +20     
+ Partials     1163     1162       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

deepmd/pt/train/training.py

deepmd/utils/argcheck.py

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@deepmd/pt/train/training.py`:
- Around line 160-189: get_opt_param currently silently defaults to "Adam" when
callers use the legacy key "opt_type" or omit "type"; change get_opt_param to
check for a legacy "opt_type" key first and if present copy it into the
canonical "type" with a deprecation warning (use warnings.warn(...,
DeprecationWarning) or the module logger) before validating opt_type, and if
neither key is present raise a ValueError instead of silently defaulting to
"Adam"; reference the function get_opt_param and the symbols opt_type,
params["type"], and params["opt_type"] when implementing this guard and warning.

In `@deepmd/utils/compat.py`:
- Around line 426-477: The conversion currently only runs when "opt_type" is
present so legacy optimizer keys left in training_cfg (e.g., adam_beta1,
weight_decay) are ignored; update the logic in the block around
training_cfg/optimizer_keys (symbols: training_cfg, optimizer_keys,
extracted_cfg, optimizer_cfg) to trigger extraction when any legacy key is
present (not just opt_type), include the legacy alias "muon_min_2d_dim" by
adding it to optimizer_keys and mapping it to "min_2d_dim" when moving keys,
then proceed to convert opt_type->type if present and merge into optimizer_cfg
(preserving conversion precedence) and keep the existing defaulting to
default_optimizer/type behavior.

🧹 Nitpick comments (2)

deepmd/pd/train/training.py (1)
152-173: Consider removing unused kf_ parameters from extraction.*

The Paddle backend only supports the Adam optimizer (validated at line 161-162), but get_opt_param still extracts Kalman Filter parameters (kf_blocksize, kf_start_pref_e, etc.) that are only relevant for the LKF optimizer. While this doesn't cause functional issues, removing these unused extractions would improve code clarity and align better with the backend's actual capabilities.

The early validation for unsupported optimizer types is good (fail-fast approach).
♻️ Optional: Remove unused kf_* parameters
         opt_param = {
-            "kf_blocksize": params.get("kf_blocksize"),
-            "kf_start_pref_e": params.get("kf_start_pref_e"),
-            "kf_limit_pref_e": params.get("kf_limit_pref_e"),
-            "kf_start_pref_f": params.get("kf_start_pref_f"),
-            "kf_limit_pref_f": params.get("kf_limit_pref_f"),
             "adam_beta1": params.get("adam_beta1"),
             "adam_beta2": params.get("adam_beta2"),
             "weight_decay": params.get("weight_decay"),
         }
deepmd/tf/train/trainer.py (1)
324-349: Minor: Duplicate optimizer type validation.

The optimizer type check at lines 325-328 is redundant since the same validation already occurs in _init_param (lines 130-133), which runs before _build_optimizer is called. The second check could be removed for cleaner code, though keeping it as a defensive guard is also acceptable.
♻️ Optional: Remove redundant validation
     def _build_optimizer(self):
-        if self.optimizer_type != "Adam":
-            raise RuntimeError(
-                f"Unsupported optimizer type {self.optimizer_type} for TensorFlow backend."
-            )
         if self.run_opt.is_distrib:

deepmd/pt/train/training.py

deepmd/utils/compat.py

deepmd/pt/train/training.py

Copilot AI review requested due to automatic review settings January 16, 2026 15:48

github-actions bot added Python Docs labels Jan 16, 2026

dosubot bot added the enhancement label Jan 16, 2026

Copilot started reviewing on behalf of OutisLi January 16, 2026 15:49 View session

Copilot AI reviewed Jan 16, 2026

View reviewed changes

deepmd/pd/train/training.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Jan 16, 2026

View reviewed changes

doc/model/train-fitting-dos.md Outdated Show resolved Hide resolved

doc/model/train-fitting-property.md Outdated Show resolved Hide resolved

doc/model/train-fitting-tensor.md Outdated Show resolved Hide resolved

OutisLi requested review from iProzd, njzjz and wanghan-iapcm January 17, 2026 09:54

OutisLi added the breaking change Breaking changes that should notify users. label Jan 17, 2026

iProzd requested changes Jan 20, 2026

View reviewed changes

deepmd/pt/train/training.py Outdated Show resolved Hide resolved

deepmd/utils/argcheck.py Show resolved Hide resolved

OutisLi force-pushed the pr/opt branch from 977db22 to 0dc550f Compare January 23, 2026 09:39

coderabbitai bot reviewed Jan 23, 2026

View reviewed changes

deepmd/pt/train/training.py Show resolved Hide resolved

deepmd/utils/compat.py Outdated Show resolved Hide resolved

wanghan-iapcm requested changes Jan 23, 2026

View reviewed changes

deepmd/pt/train/training.py Outdated Show resolved Hide resolved

deepmd/pt/train/training.py Show resolved Hide resolved

OutisLi marked this pull request as draft January 26, 2026 07:02

OutisLi added 5 commits January 26, 2026 16:55

refactor(optimizer): finalize optimizer schema and backend handling

70e7b90

fix

b7f3d66

use default values

f781fe2

add compat

019a841

fix

37b5eba

OutisLi force-pushed the pr/opt branch from 0dc550f to 37b5eba Compare January 26, 2026 08:57

compat

387a72c

refactor(optimizer): finalize optimizer schema and backend handling #5157

Are you sure you want to change the base?

refactor(optimizer): finalize optimizer schema and backend handling #5157

Uh oh!

Conversation

OutisLi commented Jan 16, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

coderabbitai bot commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

OutisLi commented Jan 16, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 16, 2026 •

edited

Loading

codecov bot commented Jan 16, 2026 •

edited

Loading