There seems so many bugs in the new version code, e.g., [issue63](https://github.com/dhcode-cpp/X-R1/issues/63#issue-2905922567), [trainer](https://github.com/dhcode-cpp/X-R1/blob/208e0bf534c9af1db46f1306524d8f351694ed0c/src/x_r1/x_grpo_trainer.py#L207).