Hello,
Thanks for making the code available, and thanks especially for this markdown file.
I noticed that the linear probe is evaluated on the embedding and not on the projection head output, which I understand is the common practice. In the paper, the authors motivate why the embedding should have an isotropic Gaussian distribution in section 3, where they show how this is optimal for both linear and non-linear probes. Given this, shouldn't the linear probe be performed on the projection head's output instead, if the LeJEPA loss is evaluated on the projection head's output (since this is what is forced to have an isotropic Gaussian distribution)?
I believe I am missing some important detail in the paper. I would be grateful if you could point out where the flaw in my understanding is.
Hello,
Thanks for making the code available, and thanks especially for this markdown file.
I noticed that the linear probe is evaluated on the embedding and not on the projection head output, which I understand is the common practice. In the paper, the authors motivate why the embedding should have an isotropic Gaussian distribution in section 3, where they show how this is optimal for both linear and non-linear probes. Given this, shouldn't the linear probe be performed on the projection head's output instead, if the LeJEPA loss is evaluated on the projection head's output (since this is what is forced to have an isotropic Gaussian distribution)?
I believe I am missing some important detail in the paper. I would be grateful if you could point out where the flaw in my understanding is.