Hi @RandallBalestriero ,
I trained LeJEPA (ViT-L/16) initialized from DINOv3 ViT-L on the Core-Five training dataset (https://huggingface.co/datasets/gajeshladhar/core-five) for 48 hours on A100.
LeJEPA outperforms DINOv3 by ~2% on global classification benchmark tasks, but underperforms by ~1.5–2% on segmentation tasks, even after trying patch-wise/local-loss modifications in SIGReg.
Looking for clarification on whether this behavior is expected or if there are recommended settings to improve local spatial representation.
Experiment Details
1. Training Setup
- Backbone: ViT-L/16
- Initialization: DINOv3 ViT-L weights
- Training Dataset: Core-Five (HighRes EO Images)
- Hardware: A100 GPU
- Training Duration: 48 hours
- Objective: As mentioned in the paper & minimal setup code
2. Global Classification (Linear Probe) Performance
LeJEPA outperformed DINOv3 by ~2% on:
- AID Scene Classification
- UC Merced
- RESISC-45
- RSSCN7
(Metric: Top-1 accuracy)
3. Segmentation / Dense Prediction Performance
LeJEPA underperformed DINOv3 by ~1.5–2% on:
- INRIA Aerial Image Labeling
- Massachusetts Roads Dataset
(Metric: IoU / F1)
4. Attempts to Improve Local Feature Accuracy
Tried the following without improvement:
- Patch-wise local loss similar to iBOT
Segmentation accuracy remained lower than DINOv3.
Questions for Authors
- Is LeJEPA inherently optimized for global semantics over local spatial accuracy?
- Any recommended hyperparameters or architectural tweaks for dense prediction tasks?
- Were internal segmentation benchmarks run, and can details be shared?
Hi @RandallBalestriero ,
I trained LeJEPA (ViT-L/16) initialized from DINOv3 ViT-L on the Core-Five training dataset (https://huggingface.co/datasets/gajeshladhar/core-five) for 48 hours on A100.
LeJEPA outperforms DINOv3 by ~2% on global classification benchmark tasks, but underperforms by ~1.5–2% on segmentation tasks, even after trying patch-wise/local-loss modifications in SIGReg.
Looking for clarification on whether this behavior is expected or if there are recommended settings to improve local spatial representation.
Experiment Details
1. Training Setup
2. Global Classification (Linear Probe) Performance
LeJEPA outperformed DINOv3 by ~2% on:
(Metric: Top-1 accuracy)
3. Segmentation / Dense Prediction Performance
LeJEPA underperformed DINOv3 by ~1.5–2% on:
(Metric: IoU / F1)
4. Attempts to Improve Local Feature Accuracy
Tried the following without improvement:
Segmentation accuracy remained lower than DINOv3.
Questions for Authors