How to set model dtype bf16 when train models by GRPO? #4493

oashua · 2025-12-11T08:57:30Z

oashua
Dec 11, 2025

I use training script from here qwen2-7b-fsdp2.log, and just change qwen2-7b to qwen3-0.6b.

Not only in log of qwen2-7b-fsdp2.log but also my script I see the warning: Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2ForCausalLM is torch.float32.

I looked up configure guide in https://verl.readthedocs.io/en/latest/examples/config.html but not found key to set type of actor model.

What should I do to set model's type bf16 in training? Thanks so much
verl version: 0.5.x

HaochenYuan · 2026-04-27T09:57:51Z

HaochenYuan
Apr 27, 2026

The key you're looking for is model_dtype on the FSDP engine config. It is not under model.* — it lives under each role's fsdp_config:

actor_rollout_ref.actor.fsdp_config.model_dtype=bfloat16 \
actor_rollout_ref.ref.fsdp_config.model_dtype=bfloat16 \
# and, if you're running PPO with a critic:
critic.model.fsdp_config.model_dtype=bfloat16

Accepted strings are anything PrecisionType.to_dtype understands — bf16 / bfloat16, fp16 / float16, fp32 / float32 (see verl/utils/torch_dtypes.py).

verl/workers/fsdp_workers.py:387

torch_dtype = fsdp_config.get("model_dtype", None)
if torch_dtype is None:
    torch_dtype = torch.float32 if self._is_actor else torch.bfloat16
else:
    torch_dtype = PrecisionType.to_dtype(torch_dtype)

The default lives in verl/trainer/config/engine/fsdp.yaml:

# model dtype of fsdp
model_dtype: fp32

The same default surfaces in _generated_ppo_trainer.yaml for actor, ref, and critic.

The FA2 warning is actually benign with the default. FSDP wraps the actor with a MixedPrecision policy whose param dtype is actor_rollout_ref.actor.fsdp_config.dtype (already bfloat16 by default), so the parameters seen by attention during forward/backward are bf16 even when model_dtype=fp32. The warning fires once at HF load time, before FSDP wraps the module.
Why the actor defaults to fp32. There's a comment right above the dtype block in fsdp_workers.py:

"we have to create model in fp32. Otherwise, the optimizer is in bf16, which is incorrect."

Setting model_dtype=bfloat16 makes the master weights bf16, which saves roughly half the parameter memory and silences the warning, but you lose the fp32 master-weight numerical stability that the AdamW step assumes. For small models (e.g. qwen3-0.6b) it's usually fine; for larger or longer training runs, the fp32 default is the safer choice and the warning can simply be ignored.
Don't confuse it with the other two dtype keys.
- actor_rollout_ref.actor.fsdp_config.dtype — FSDP mixed-precision compute dtype (defaults to bfloat16).
- actor_rollout_ref.rollout.dtype — inference engine (vLLM / SGLang) dtype (defaults to bfloat16).
Both are independent of model_dtype and already bf16 by default.

Recommendation

If your goal is just to silence the FA2 warning → you can ignore it; nothing is actually running in fp32 during forward/backward.
If you want the model genuinely loaded in bf16 (lower memory, accept the bf16-master-weight tradeoff) → add the override on both the actor and the ref, since the ref policy compares logprobs against the actor and you want them consistent:

actor_rollout_ref.actor.fsdp_config.model_dtype=bfloat16 \
actor_rollout_ref.ref.fsdp_config.model_dtype=bfloat16

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to set model dtype bf16 when train models by GRPO? #4493

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to set model dtype bf16 when train models by GRPO? #4493

Uh oh!

oashua Dec 11, 2025

Replies: 1 comment

Uh oh!

HaochenYuan Apr 27, 2026

Recommendation

oashua
Dec 11, 2025

HaochenYuan
Apr 27, 2026