[QUESTION] Does Megatron-LM supports Flash Attention for BERT and T5 Pretraining? #979
Unanswered
Leo-T-Zang
asked this question in
Q&A
Replies: 2 comments 1 reply
-
|
@shanmugamr1992 Please help answering the question. Thank you! |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
Megatron lm , when you use mcore models will support flash attention in the next couple of weeks. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
My question
Does Megatron-LM supports Flash Attention for BERT and T5 Pretraining? If so, where is the code specifically supports such feature?
Thanks!!!
Beta Was this translation helpful? Give feedback.
All reactions