Skip to content

The issues in building new scenarios for testing models. #136

@Xj1567589354

Description

@Xj1567589354

Thank you very much for your excellent work. I currently have a question regarding our evaluation setup.

We have imported new object assets outside of LIBERO into the LIBERO simulation environment and constructed several new scenes using LIBERO’s built-in scene templates. However, we did not fine-tune the evaluated models using trajectory data collected in our newly constructed scenes. As a result, the evaluated models achieve near-zero success rates on these new scenes.

Based on this observation, we would like to ask the following questions:

  1. Does this imply that collecting trajectory data in our new scenes and fine-tuning the evaluated models is necessary in order to obtain reasonable performance? If trajectory data need to be collected, approximately how many demonstrations per scene would typically be required for effective fine-tuning?

  2. Apart from fine-tuning, are there other possible solutions? We observed that when we use LIBERO’s original BDDL scenes and simply replace the target objects with our newly introduced objects, the model’s success rate increases to around 80%. This leads us to strongly suspect that current VLA models may be largely relying on memorized scene layouts and trajectory patterns observed during training, rather than exhibiting strong generalization.

In this case, are there any alternative approaches or mitigation strategies that could improve performance without fine-tuning on our new scenes?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions