tch fix_bug in grpo_trainer.py #23

Hui-design · 2025-02-23T09:24:49Z

Hi, thanks for your amazing work!
I've found that the inputs to 'get_per_token_logps' for model and ref_model are different, which might lead to a critical bug. This bug appears in R1-mulitimodal and Open-R1-Video. I've documented my understanding in this blog Hui-design/R1-Video-fixbug. Open-R1-Video has merged my pull request and got performance gains. Could you please review it again to see if my pull request is helpful?

tch fix_bug

f2be85f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tch fix_bug in grpo_trainer.py #23

tch fix_bug in grpo_trainer.py #23

Uh oh!

Hui-design commented Feb 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tch fix_bug in grpo_trainer.py #23

Are you sure you want to change the base?

tch fix_bug in grpo_trainer.py #23

Uh oh!

Conversation

Hui-design commented Feb 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant