ValueError: You are attempting to perform batched generation with padding_side='right' this may lead to unexpected behaviour for Flash Attention version of Qwen2. Make sure to  call `tokenizer.padding_side  = 'left'` before tokenizing the input.

Run command as such:
accelerate launch --config_file recipes/zero3.yaml --num_processes=3 src/x_r1/grpo.py --config recipes/examples/mathcn_zero_3B_config.yaml

Report Error as such:
Traceback (most recent call last):
  File "/home/xin.jiang3/X-R1/src/x_r1/grpo.py", line 275, in <module>
    main(script_args, training_args, model_args )
  File "/home/xin.jiang3/X-R1/src/x_r1/grpo.py", line 239, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/transformers/trainer.py", line 2241, in train
    return inner_training_loop(
  File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/transformers/trainer.py", line 2548, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
  File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/transformers/trainer.py", line 3692, in training_step
    inputs = self._prepare_inputs(inputs)
  File "/home/xin.jiang3/X-R1/src/x_r1/x_grpo_trainer.py", line 495, in _prepare_inputs
    ref_per_token_logps = self._get_per_token_logps(
  File "/home/xin.jiang3/X-R1/src/x_r1/x_grpo_trainer.py", line 392, in _get_per_token_logps
    logits = model(input_ids=input_ids, attention_mask=attention_mask, logits_to_keep=logits_to_keep + 1).logits
  File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 18, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1914, in forward
    loss = self.module(*inputs, **kwargs)
  File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
    return inner()
  File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1790, in inner
    result = forward_call(*args, **kwargs)
  File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
    return func(*args, **kwargs)
  File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 856, in forward
    outputs = self.model(
  File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
    return inner()
  File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1790, in inner
    result = forward_call(*args, **kwargs)
  File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 549, in forward
    causal_mask = self._update_causal_mask(
  File "/home/xin.jiang3/.conda/envs/poi/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 622, in _update_causal_mask
    raise ValueError(
ValueError: You are attempting to perform batched generation with padding_side='right' this may lead to unexpected behaviour for Flash Attention version of Qwen2. Make sure to  call `tokenizer.padding_side  = 'left'` before tokenizing the input. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ValueError: You are attempting to perform batched generation with padding_side='right' this may lead to unexpected behaviour for Flash Attention version of Qwen2. Make sure to call `tokenizer.padding_side = 'left'` before tokenizing the input. #63

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

ValueError: You are attempting to perform batched generation with padding_side='right' this may lead to unexpected behaviour for Flash Attention version of Qwen2. Make sure to call tokenizer.padding_side = 'left' before tokenizing the input. #63

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

ValueError: You are attempting to perform batched generation with padding_side='right' this may lead to unexpected behaviour for Flash Attention version of Qwen2. Make sure to call `tokenizer.padding_side = 'left'` before tokenizing the input. #63