You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi thanks for your code, it has helped a lot!
however I see a problem with the code of the attention rollout, because you call the attentions from the attn_dropout, which is not the attention matrix QK?
may someone inform me about the reason of picking this layer.