Skip to content

What if the visual transformer does not have a class token? #25

@Tgaaly

Description

@Tgaaly

I see the code in the VITAttentionGradRollout code requires a class token. What if the model architecture does not have a class token?

If for example my attention layer is 196x196 (corresponding to 14x14 spatial resolution), can one take the mean of all other patches w.r.t. to each patch, as follows: mask = result[0].mean(0)? I've tried this and I didn't get very meaningful results. Is there another way to deal with transformers without class tokens?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions