Clarification on evaluation metric

Hi,
Thanks for open-sourcing this framework! I'm trying to reproduce the results of the baselines reported in the Robohive paper, and wanted to ask what is the exact metric that is averaged over 3 seeds in the Franka-expert data runs (here: https://github.com/facebookresearch/agenthive/tree/dev/scripts)?
Is it the maximum success rate over a run averaged over 3 seeds or the maximum of the average success rate over 3 seeds or something else?
The paper doesn't seem to mention exactly how the success rate of a run is decided (over many checkpoints).
Thanks!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clarification on evaluation metric #18

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Clarification on evaluation metric #18

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions