-
Notifications
You must be signed in to change notification settings - Fork 346
Description
Thank you very much for your excellent work. I currently have a question regarding our evaluation setup.
We have imported new object assets outside of LIBERO into the LIBERO simulation environment and constructed several new scenes using LIBERO’s built-in scene templates. However, we did not fine-tune the evaluated models using trajectory data collected in our newly constructed scenes. As a result, the evaluated models achieve near-zero success rates on these new scenes.
Based on this observation, we would like to ask the following questions:
-
Does this imply that collecting trajectory data in our new scenes and fine-tuning the evaluated models is necessary in order to obtain reasonable performance? If trajectory data need to be collected, approximately how many demonstrations per scene would typically be required for effective fine-tuning?
-
Apart from fine-tuning, are there other possible solutions? We observed that when we use LIBERO’s original BDDL scenes and simply replace the target objects with our newly introduced objects, the model’s success rate increases to around 80%. This leads us to strongly suspect that current VLA models may be largely relying on memorized scene layouts and trajectory patterns observed during training, rather than exhibiting strong generalization.
In this case, are there any alternative approaches or mitigation strategies that could improve performance without fine-tuning on our new scenes?