feat. Add more optimizer choice #431

ZiyiTsang · 2025-10-11T06:53:46Z

This pull request expands optimizer support in the codebase, allowing users to select between Adam, SGD, and a new AnyPrecision AdamW optimizer (adam_bf16). It also introduces a custom implementation of AnyPrecisionAdamW for improved mixed-precision training and updates documentation and CLI argument validation accordingly.

Optimizer Support Enhancements:

Added support for sgd and adam_bf16 optimizer types in OptimizerConfig, and updated CLI argument validation to reflect these new options.
Updated the base HuggingFace engine (areal/engine/base_hf_engine.py) to allow selection between AdamW, SGD, and the new AnyPrecisionAdamW optimizer, invoking the appropriate optimizer based on user configuration.
Updated the Megatron engine (areal/experimental/megatron_engine.py) to allow both AdamW and SGD optimizers, passing the selected type to the optimizer configuration.

New Optimizer Implementation:

Added a new AnyPrecisionAdamW optimizer in areal/utils/fsdp/optimizer.py, supporting flexible precision (bfloat16/float32) and optional Kahan summation for improved numerical stability during mixed-precision training.

AdamW_bf16/SGD optimizer maybe useful to avoid OOM, but less stability

Note
AnyPrecisionAdamW should not be changed as it is from other repo's implementation.
PR #366 should put after this PR

Optimizer Support

Optimizer	FSDP	Megatron
AdamW	✅	✅
SGD	✅	✅
AdamW_bf16	✅	❌

ZiyiTsang · 2025-10-11T06:55:14Z

/gemini review

ZiyiTsang · 2025-10-11T06:57:14Z

PR #366 should put after this PR

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…into optimizer_addddd

ZiyiTsang · 2025-10-12T09:05:20Z

Clear for review.

garrett4wade

LGTM, waiting for other reviewer's comments.

nuzant

LGTM, only one small issue.

areal/engine/base_hf_engine.py

areal/engine/ppo/actor.py

Copilot

Pull Request Overview

This pull request expands optimizer support in the codebase, allowing users to select between Adam, SGD, and a new AnyPrecision AdamW optimizer (adam_bf16) for improved mixed-precision training.

Adds support for sgd and adam_bf16 optimizer types across CLI validation, FSDP, and Megatron engines
Implements a custom AnyPrecisionAdamW optimizer with flexible precision and optional Kahan summation
Refactors optimizer creation from base class to engine-specific implementations

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`docs/cli_reference.md`	Updates documentation to reflect new optimizer choices and parameter applicability
`areal/utils/fsdp/optimizer.py`	Adds new `AnyPrecisionAdamW` optimizer implementation with precision control
`areal/experimental/megatron_engine.py`	Updates Megatron engine to support AdamW and SGD optimizers
`areal/engine/ppo/actor.py`	Minor formatting changes (whitespace additions)
`areal/engine/fsdp_engine.py`	Implements optimizer creation with support for all three optimizer types
`areal/engine/base_hf_engine.py`	Refactors optimizer creation to abstract method pattern
`areal/api/cli_args.py`	Updates CLI argument validation and adds version check logic

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

areal/experimental/megatron_engine.py

areal/engine/fsdp_engine.py

areal/engine/ppo/actor.py

areal/utils/fsdp/optimizer.py

rchardx · 2025-10-13T03:09:59Z

/gemini review

ZiyiTsang · 2025-10-13T06:55:25Z

/gemini review

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

ZiyiTsang · 2025-10-13T07:01:53Z

Done. Review again please @garrett4wade .

docs/cli_reference.md

garrett4wade

LGTM

.

36e35d6

ZiyiTsang had a problem deploying to AReaL-unittests October 11, 2025 06:53 — with GitHub Actions Error

This comment was marked as resolved.

Sign in to view

Update docs/cli_reference.md

386ff07

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

ZiyiTsang had a problem deploying to AReaL-unittests October 11, 2025 07:08 — with GitHub Actions Error

ZiyiTsang added 2 commits October 11, 2025 04:22

.

82c1948

Merge branch 'optimizer_addddd' of https://github.com/ZiyiTsang/AReaL …

a7d253c

…into optimizer_addddd

ZiyiTsang had a problem deploying to AReaL-unittests October 11, 2025 13:25 — with GitHub Actions Error

Merge branch 'main' into optimizer_addddd

c2cac2c

ZiyiTsang had a problem deploying to AReaL-unittests October 11, 2025 13:29 — with GitHub Actions Error

ZiyiTsang added 2 commits October 12, 2025 00:00

.

eae2f02

Merge branch 'main' into optimizer_addddd

7022081

ZiyiTsang had a problem deploying to AReaL-unittests October 12, 2025 09:04 — with GitHub Actions Failure

ZiyiTsang marked this pull request as ready for review October 12, 2025 09:05

garrett4wade approved these changes Oct 13, 2025

View reviewed changes

garrett4wade requested review from dhh1995 and rchardx October 13, 2025 02:01

garrett4wade requested a review from nuzant October 13, 2025 02:01

nuzant reviewed Oct 13, 2025

View reviewed changes

areal/engine/base_hf_engine.py Outdated Show resolved Hide resolved

rchardx reviewed Oct 13, 2025

View reviewed changes

areal/engine/ppo/actor.py Outdated Show resolved Hide resolved

rchardx requested a review from Copilot October 13, 2025 03:04

Copilot AI reviewed Oct 13, 2025

View reviewed changes

areal/experimental/megatron_engine.py Outdated Show resolved Hide resolved

areal/engine/fsdp_engine.py Outdated Show resolved Hide resolved

areal/engine/ppo/actor.py Outdated Show resolved Hide resolved

areal/utils/fsdp/optimizer.py Outdated Show resolved Hide resolved

This comment was marked as resolved.

Sign in to view

.

5f91c66

ZiyiTsang had a problem deploying to AReaL-unittests October 13, 2025 06:53 — with GitHub Actions Failure

Merge branch 'main' into optimizer_addddd

7bb1678

This comment was marked as resolved.

Sign in to view

ZiyiTsang had a problem deploying to AReaL-unittests October 13, 2025 06:57 — with GitHub Actions Failure

Update areal/engine/fsdp_engine.py

79e2168

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

ZiyiTsang had a problem deploying to AReaL-unittests October 13, 2025 06:58 — with GitHub Actions Failure

ZiyiTsang had a problem deploying to AReaL-unittests October 13, 2025 06:58 — with GitHub Actions Error

Update areal/experimental/megatron_engine.py

c6db04c

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

ZiyiTsang had a problem deploying to AReaL-unittests October 13, 2025 06:59 — with GitHub Actions Failure

rchardx reviewed Oct 13, 2025

View reviewed changes

docs/cli_reference.md Show resolved Hide resolved

ZiyiTsang requested a review from garrett4wade October 13, 2025 08:22

garrett4wade approved these changes Oct 13, 2025

View reviewed changes

garrett4wade merged commit 9ba6dc4 into inclusionAI:main Oct 13, 2025
1 of 4 checks passed

feat. Add more optimizer choice #431

feat. Add more optimizer choice #431

Uh oh!

Conversation

ZiyiTsang commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ZiyiTsang commented Oct 11, 2025

Uh oh!

This comment was marked as resolved.

Uh oh!

ZiyiTsang commented Oct 11, 2025

Uh oh!

ZiyiTsang commented Oct 12, 2025

Uh oh!

garrett4wade left a comment

Choose a reason for hiding this comment

Uh oh!

nuzant left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rchardx commented Oct 13, 2025

Uh oh!

This comment was marked as resolved.

Uh oh!

ZiyiTsang commented Oct 13, 2025

Uh oh!

This comment was marked as resolved.

Uh oh!

ZiyiTsang commented Oct 13, 2025

Uh oh!

Uh oh!

garrett4wade left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ZiyiTsang commented Oct 11, 2025 •

edited

Loading