Skip to content

ValueError: You are attempting to perform batched generation with padding_side='right' this may lead to unexpected behaviour for Flash Attention version of Qwen2. Make sure to call tokenizer.padding_side = 'left' before tokenizing the input. #63

@nioxinjiang3

Description

@nioxinjiang3

Run command as such:
accelerate launch --config_file recipes/zero3.yaml --num_processes=3 src/x_r1/grpo.py --config recipes/examples/mathcn_zero_3B_config.yaml

Report Error as such:
Traceback (most recent call last):
File "/home/xin.jiang3/X-R1/src/x_r1/grpo.py", line 275, in
main(script_args, training_args, model_args )
File "/home/xin.jiang3/X-R1/src/x_r1/grpo.py", line 239, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/transformers/trainer.py", line 2241, in train
return inner_training_loop(
File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/transformers/trainer.py", line 2548, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/transformers/trainer.py", line 3692, in training_step
inputs = self._prepare_inputs(inputs)
File "/home/xin.jiang3/X-R1/src/x_r1/x_grpo_trainer.py", line 495, in _prepare_inputs
ref_per_token_logps = self._get_per_token_logps(
File "/home/xin.jiang3/X-R1/src/x_r1/x_grpo_trainer.py", line 392, in _get_per_token_logps
logits = model(input_ids=input_ids, attention_mask=attention_mask, logits_to_keep=logits_to_keep + 1).logits
File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 18, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1914, in forward
loss = self.module(*inputs, **kwargs)
File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
return inner()
File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1790, in inner
result = forward_call(*args, **kwargs)
File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
return func(*args, **kwargs)
File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 856, in forward
outputs = self.model(
File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
return inner()
File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1790, in inner
result = forward_call(*args, **kwargs)
File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 549, in forward
causal_mask = self._update_causal_mask(
File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 622, in _update_causal_mask
raise ValueError(
ValueError: You are attempting to perform batched generation with padding_side='right' this may lead to unexpected behaviour for Flash Attention version of Qwen2. Make sure to call tokenizer.padding_side = 'left' before tokenizing the input.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions