Skip to content

无法复现X-R1-3B在Math500数据集上精度 #58

@women1995

Description

@women1995

Hi,想请问用X-R1-750的数据训Qwen2.5-3B全量微调模型,per_device_train_batch_size/num_generations是咋设置的呢?是有额外的trick么?

我的训练参数(per_device_train_batch_size=1,num_generations=4,num_processes=4),到后期,Loss或KL散度会有激增,在Math500上acc始终只有acc_0.218, format_0.414。相比之下,Huggingface上开源的X-R1-3B这一模型是acc_0.346, format_0.886。

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions