Skip to content

Conversation

@yuyijiong
Copy link

  1. Support more models such as mistral, gemma, qwen2
  2. Support flash attention
  3. Truncate sequence length to under 4k when calculate outliers to avoid OOM

@ZackZikaiXiao
Copy link

Good job, bugs of casual mask shape are fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants