GRPO Trainer by michaelbenayoun · Pull Request #1020 · huggingface/optimum-neuron

michaelbenayoun · 2025-11-04T16:32:24Z

What does this PR do?

This PR adds partial support for GRPO.

It was broken down into smaller PRs:

It adds the NeuronGRPOTrainer with a set of optimizations and modifications for the Torch XLA backend used to run things on Trainium instances. There are still core missing features:

Integration with vLLM: we use a custom CPU vLLM hack for now. The plan is to work on the vLLM part on another PR.
Weight Synchronization NeuronGRPOTrainer <-> vLLM
No tensor parallelism

HuggingFaceDocBuilderDev · 2025-11-04T16:38:15Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copilot

Pull request overview

This PR adds partial support for GRPO (Group Relative Policy Optimization) training on Neuron (Trainium) devices through the new NeuronGRPOTrainer class. The implementation includes XLA-specific optimizations and modifications to work with the Torch XLA backend, though several core features remain unimplemented (vLLM integration, weight synchronization, tensor parallelism).

Changes:

Adds NeuronGRPOTrainer with XLA-optimized implementations for generation, scoring, and loss computation
Introduces NeuronGRPOConfig for configuration with experimental flag requirement
Implements XLA-friendly utility functions (padding, entropy, statistical operations) in trl_utils.py
Adds custom vLLM client implementations with CPU communicator and mock client for testing
Updates NeuronTrainer to support _prepare_inputs hook and replaces xm.mark_step() with torch_xla.sync()
Modifies LoRA transformation utilities to handle missing weights more gracefully

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 19 comments.

Show a summary per file

File	Description
optimum/neuron/trainers/grpo_trainer.py	Core GRPO trainer implementation with XLA optimizations (1414 lines, new file)
optimum/neuron/trainers/grpo_config.py	Configuration class with validation and experimental flag (118 lines, new file)
optimum/neuron/trainers/trl_utils.py	XLA-optimized utility functions for padding, statistics, and sampling (270 lines)
optimum/neuron/trainers/extras/vllm_client.py	Custom vLLM clients for Neuron with CPU communicator and mock implementation (213 lines, new file)
optimum/neuron/trainers/transformers.py	Updates to NeuronTrainer for `_prepare_inputs` hook and `torch_xla.sync()` migration
optimum/neuron/trainers/utils.py	Adds `move_inputs_to_device` utility and updates XLAPrefetchIterator
optimum/neuron/models/training/transformations_utils.py	Converts LoRA weight errors to silent skips for flexibility
optimum/neuron/trainers/metrics/collector.py	Refactors `get_metric_unit` for cleaner logic
optimum/neuron/utils/init.py	Exports `is_vllm_available` function
optimum/neuron/init.py	Exports `NeuronGRPOTrainer` and `NeuronGRPOConfig`
.github/actions/install_optimum_neuron/action.yml	Adds training extras to CI installation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

optimum/neuron/trainers/extras/vllm_client.py

optimum/neuron/trainers/trl_utils.py

optimum/neuron/trainers/grpo_trainer.py

optimum/neuron/trainers/extras/vllm_client.py

optimum/neuron/trainers/grpo_trainer.py

Copilot · 2026-02-05T10:00:55Z

optimum/neuron/models/training/transformations_utils.py

                if to_concat_and_duplicate_name is None or to_unfuse_name is None:
-                    raise ValueError(
-                        f"Could not find LoRA weights for {module_fully_qualified_name} with param name {param_name}."
-                    )
+                    continue


Similar to the previous issue, this converts a hard error into a silent skip. This could hide configuration problems. Consider logging when weights are not found to aid debugging.

optimum/neuron/trainers/grpo_trainer.py

dacorvo

No more blockers, but will review in more details tomorrow. Copilot detected some issues that may be considered.

Blocker addressed

…or efficiency

…a weight conversion

JingyaHuang

nothing to add from my side, if @tengomucho is ok with it as well.

dacorvo · 2026-02-12T10:13:37Z

@michaelbenayoun can you rebase or merge the main branch, it looks like you are using the non-working sanity changes that @tengomucho fixed in #1077. As a consequence, the training tests are not launched.

tengomucho

It is hard for me to follow exactly what this allows, so I will have to trust you on this.
I just ask for the sanity checks to be run first, as mentioned by @dacorvo
Generally speaking, if you add a code feature, it would be better to add a test for it, don't you think?

tengomucho · 2026-02-12T10:38:01Z

optimum/neuron/trainers/transformers.py


            self.control = self.callback_handler.on_epoch_end(args, self.state, self.control)
-            xm.mark_step()
+            torch_xla.sync()


I think mark_step is deprecated, so this is a good change, but don't you think it would be better to change all occurrences of this?

You mean in the whole library?

In tests it is still existing, but in the library, the only occurence is in `optimum/neuron/generation/utils.py, I will update it.

michaelbenayoun · 2026-02-13T16:08:27Z

It is hard for me to follow exactly what this allows, so I will have to trust you on this. I just ask for the sanity checks to be run first, as mentioned by @dacorvo Generally speaking, if you add a code feature, it would be better to add a test for it, don't you think?

For the test we will do it once it's fully covered because for now the code runs with a CPU vLLM env.
Also in the sub PRs we have added some tests.

github-actions · 2026-03-01T08:07:22Z

This PR is stale because it has been open 15 days with no activity. Remove stale label or comment or this will be closed in 5 days.

michaelbenayoun · 2026-03-03T14:04:15Z

For now not merging with the recent progress in Torch Native.

michaelbenayoun added 23 commits October 15, 2025 15:46

fix: remove wrong trl imports

19ad728

feat: align to latest trl release

34f4698

chore: update pyproject.toml

07437bc

style

e5256bf

feat: sync with SFTTrainer

954cfdf

Merge branch 'main' into sync_trl

3f1f700

fix: minor issues

3c72216

chore: sync with trl==0.24.0

d286f50

chore: sync sft_trainer

0992738

chore: sync sft_trainer

5a847ec

chore: sync sft_trainer

cddbf5f

fix: sft trainer

0200820

Merge branch 'main' into sync_trl

5bd79e6

chore: update dependency version for trl

2c8c1d1

chore: cleanup and fix no-packing test

7eda163

chore: restore finetune_qwen3.sh

b6ee2a3

feat: add model card creation when saving a checkpoint

72b338a

chore: remove model card support

98a6210

doc: align with trl==0.24.0

ee6caeb

test: fix broken sft + peft test

ac0c9f2

chore: add GRPO imports in optimum.neuron

8892d51

chore: add GRPO imports in optimum.neuron.trainers

f26497d

chore: add skeleton for GRPO trainer

f574d3e

michaelbenayoun added 6 commits November 4, 2025 18:07

feat: add mock class for vLLM

b105f91

Merge branch 'main' into grpo

b2d45f0

fix: add is_vllm_available imports

781b27f

chore: add data loading

e932e28

chore: add _prepare_inputs

cef6d30

chore: keep replacing stub methods

a567289

Copilot started reviewing on behalf of dacorvo February 5, 2026 09:53 View session

Copilot AI reviewed Feb 5, 2026

View reviewed changes

michaelbenayoun added 4 commits February 5, 2026 14:14

fix: restore requirements

47130f1

fix: conditional import

d6f9cc2

fix: conditional import

7b143f3

chore: fix imports

bd2b633

michaelbenayoun requested a review from dacorvo February 5, 2026 16:33

dacorvo reviewed Feb 5, 2026

View reviewed changes

michaelbenayoun added 10 commits February 6, 2026 16:30

chore: use better Exception class for vllm_client.py

5af3c39

chore: use pre-created tensors instead of dynamically creating them f…

478c9f6

…or efficiency

chore: add torch.distributed initialization check in trl_utils.py

bcc881d

chore: fix typo in trl_utils.py

f793d60

chore: add warning if weight is not found for sharded -> original lor…

52d320b

…a weight conversion

chore: add missing method in NeuronGRPOTrainer

5ad1dcb

chore: comment call to unimplemented method

d305f05

chore: fix device for doc building

2ac2a85

chore: fix device for doc building

6736eb6

chore: fix device for doc building

5b76a9e

JingyaHuang reviewed Feb 11, 2026

View reviewed changes

tengomucho reviewed Feb 12, 2026

View reviewed changes

michaelbenayoun added 3 commits February 13, 2026 11:12

refactor: xm.mark_step() to torch_xla.sync() for library code

fc7e1d4

refactor: xm.mark_step() to torch_xla.sync() for tests

b51f3f9

Merge branch 'main' into grpo

936157d

github-actions bot added the Stale label Mar 1, 2026

github-actions bot removed the Stale label Mar 4, 2026

Conversation

michaelbenayoun commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Nov 4, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dacorvo left a comment

Choose a reason for hiding this comment

Uh oh!

JingyaHuang left a comment

Choose a reason for hiding this comment

Uh oh!

dacorvo commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tengomucho left a comment

Choose a reason for hiding this comment

Uh oh!

tengomucho Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

michaelbenayoun Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

michaelbenayoun commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 1, 2026

Uh oh!

michaelbenayoun commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

michaelbenayoun commented Nov 4, 2025 •

edited

Loading

dacorvo commented Feb 12, 2026 •

edited

Loading

michaelbenayoun commented Feb 13, 2026 •

edited

Loading