Skip to content

grpo训练agent中的小bug(infer_requests只传入了final answer部分) #5509

@XuJH5080

Description

@XuJH5080

我用默认的orm.py中的toolbench奖励函数做grpo训练的时候,调试发现ReactORM类的call方法中的infer_requests参数只存了大模型输出的final answer部分,下面是调试中的变量:

infer_requests = ['Final Answer: There is 1 destroyed building out of 14 total buildings in the post-event image, which is a ratio of 0.071 or 7.1%.', 'Final Answer: There is 1 destroyed building out of a total of 14 buildings in the post-event image, which results in a ratio of 1:14.']

这显然不符合大佬您写这个ReactORM时能够检查tool调用的初衷,期待版本更新

祝好!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions