-
Notifications
You must be signed in to change notification settings - Fork 19
Description
Hi,
Thanks for the impressive work! I’ve been exploring the repository and successfully ran the Quick Interactive M1 Demo from the README. The model correctly identified the apple's position, and my visualization confirmed the accuracy.
Output: ['{"label": "apple", "bbox_2d": [304, 226, 345, 267]}']

Furthermore, when I fed it an image from the LIBERO evaluation environment, it could still correctly identify the position of the plate.
Output: ['{"label": "plate", "bbox_2d": [26, 147, 68, 179]}']

However, these are all based on the Pretrain-RT-1-Bridge checkpoint. When using the libero-related checkpoints, the model's output becomes "garbled" or nonsensical (unlike the clean results from the M1 demo).
Output: ['极为这般 ENCჭpunkt\n极为勍:\n桀chantment实效uri\n极力栽培差别澎湃\néra:\n粜acity!\n粜acity!\nerrer!\nerrer?\npragma:\nerrer?\n搛\n恪光阴;\n债权?\n\t \nrang!\n恪光阴;\n恪光阴!\n機能: []\nável!\n.pathname!\n機能:!\n恪光阴!\n.pathname!\n機能:!\n恪光阴!\n.pathname!\n恪光阴!\n機能:!\n恪光阴!\n.pathname!\n恪光阴!\n機能:!\n恪光阴!\n.pathname!\n恪光阴!\n機能:!\n恪光阴!\n.pathname!\n恪光阴!\n機能:!\n恪光阴!\n.pathname!\n恪光阴!\n']
My questions are:
- Does the fine-tuned LIBERO model sacrifice its general grounding/VQA capability to achieve higher task success rates?
- I noticed the
Pretrain-RT-1-Bridgeckpt usesdinov2_vits14while thelibero-relatedckpts usedinov2_vitl14. Why is that? What does "Pretrain" signify here? Is it a general-purpose robotic foundation model before tuning into libero ckpt? What is the relationship between these versions? - Since the
Pretrain-RT-1-Bridgeckpt already demonstrates strong zero-shot grounding capabilities, I am curious about its potential for direct libero evaluation. Is that possible?
Please excuse any imprecise terminology, as I am still a newcomer to this field and am learning the ropes.