grpo训练agent中的小bug（infer_requests只传入了final answer部分）

我用默认的orm.py中的toolbench奖励函数做grpo训练的时候，调试发现ReactORM类的call方法中的infer_requests参数只存了大模型输出的final answer部分，下面是调试中的变量：

infer_requests = ['Final Answer: There is 1 destroyed building out of 14 total buildings in the post-event image, which is a ratio of 0.071 or 7.1%.', 'Final Answer: There is 1 destroyed building out of a total of 14 buildings in the post-event image, which results in a ratio of 1:14.']

这显然不符合大佬您写这个ReactORM时能够检查tool调用的初衷，期待版本更新

祝好！

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

grpo训练agent中的小bug（infer_requests只传入了final answer部分） #5509

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

grpo训练agent中的小bug（infer_requests只传入了final answer部分） #5509

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions