Skip to content

LeJEPA for time series / video #27

@chemeris

Description

@chemeris

Very inspiring work on SIGReg/LeJEPA!

I noticed that your demo GIFs are videos, but the code and the paper only mention images. We're training a time-series encoder using JEPA, which is more similar to video than to images due to its time-domain nature, and we're now trying to apply LeJEPA instead of the classical JEPA. I would appreciate your thoughts on how to apply LeJEPA when the time dimension is present.

  1. In your video demos, do you train the encoder on images and then apply it to the video frame-by-frame? I.e. is it a true video encoder, or just an image encoder applied to video?
  2. My thinking is that in the case of video or time series, we would need a predictor like in V-JEPA, with SIGReg replacing EMA+StopGrad. Maybe-maybe we can make the predictor much simpler (an MLP instead of a transformer), but I doubt we can do without it. And then the training objective will be prediction, not invariance. Does this match your intuition? And will the SIGReg work with the Prediction objective instead of invariance?
  3. Our preliminary experiments show that when training a time series JEPA with SIGReg, time-dimension embeddings collapse, so we applied SIGReg twice - across all embeddings of each batch sample individually, and across aggregated embeddings between batch samples. This seems to prevent the collapse, but we haven't finished the downstream task validations yet (it's a bit more developed than in CV, unfortunately).

PS If anyone else is interested in time-series self-supervised JEPA / representation learning - I'd be very interested to chat.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions