Hi, @RandallBalestriero
Thanks for the great work. I'm wondering if there are any specific reasons on why LeJEPA used DINO style transformation for representation consistency. (e.g. I-JEPA, V-JEPA have masked images/videos as target, but LeJEPA doesn't seem to have them according to https://github.com/rbalestr-lab/lejepa/blob/main/MINIMAL.md?) Did you experiment on both and found DINO style (global/local views) augmentation give better performance?
Hi, @RandallBalestriero
Thanks for the great work. I'm wondering if there are any specific reasons on why LeJEPA used DINO style transformation for representation consistency. (e.g. I-JEPA, V-JEPA have masked images/videos as target, but LeJEPA doesn't seem to have them according to https://github.com/rbalestr-lab/lejepa/blob/main/MINIMAL.md?) Did you experiment on both and found DINO style (global/local views) augmentation give better performance?