Support more models and flash attention 2 #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

yuyijiong wants to merge 3 commits into VITA-Group:main from yuyijiong:main

yuyijiong commented May 7, 2024

Support more models such as mistral, gemma, qwen2
Support flash attention
Truncate sequence length to under 4k when calculate outliers to avoid OOM

yuyijiong added 3 commits

May 7, 2024 16:45


          Update setup.py

34fb69b


          Create mspoe_models

0ec4736


          Rename mspoe_models to mspoe_models.py

cc12454

ZackZikaiXiao commented Jul 10, 2024

Good job, bugs of casual mask shape are fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet