|
4 | 4 |
|
5 | 5 | | MODEL FAMILY | MODEL NAME (Huggingface hub) | FP32 | BF16 | Static quantization INT8 | Weight only quantization INT8 | Weight only quantization INT4 |
|
6 | 6 | |:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|
7 |
| -|LLAMA| meta-llama/Llama-2-7b-hf | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
8 |
| -|LLAMA| meta-llama/Llama-2-13b-hf | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
9 |
| -|LLAMA| meta-llama/Llama-2-70b-hf | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
10 |
| -|LLAMA| meta-llama/Meta-Llama-3-8B | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
11 |
| -|LLAMA| meta-llama/Meta-Llama-3-70B | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
12 |
| -|LLAMA| meta-llama/Meta-Llama-3.1-8B-Instruct | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
13 |
| -|LLAMA| meta-llama/Llama-3.2-3B-Instruct | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
14 |
| -|LLAMA| meta-llama/Llama-3.2-11B-Vision-Instruct | 🟩 | 🟩 | | 🟩 | | |
15 |
| -|GPT-J| EleutherAI/gpt-j-6b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
16 |
| -|GPT-NEOX| EleutherAI/gpt-neox-20b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
17 |
| -|DOLLY| databricks/dolly-v2-12b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
18 |
| -|FALCON| tiiuae/falcon-7b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
19 |
| -|FALCON| tiiuae/falcon-11b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
20 |
| -|FALCON| tiiuae/falcon-40b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
21 |
| -|OPT| facebook/opt-30b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
22 |
| -|OPT| facebook/opt-1.3b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
23 |
| -|Bloom| bigscience/bloom-1b7 | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
24 |
| -|CodeGen| Salesforce/codegen-2B-multi | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
25 |
| -|Baichuan| baichuan-inc/Baichuan2-7B-Chat | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
26 |
| -|Baichuan| baichuan-inc/Baichuan2-13B-Chat | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
27 |
| -|Baichuan| baichuan-inc/Baichuan-13B-Chat | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
28 |
| -|ChatGLM| THUDM/chatglm3-6b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
29 |
| -|ChatGLM| THUDM/chatglm2-6b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
30 |
| -|GPTBigCode| bigcode/starcoder | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
31 |
| -|T5| google/flan-t5-xl | 🟩 | 🟩 | 🟩 | 🟩 | | |
32 |
| -|MPT| mosaicml/mpt-7b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
33 |
| -|Mistral| mistralai/Mistral-7B-v0.1 | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
34 |
| -|Mixtral| mistralai/Mixtral-8x7B-v0.1 | 🟩 | 🟩 | | 🟩 | 🟩 | |
35 |
| -|Stablelm| stabilityai/stablelm-2-1_6b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
36 |
| -|Qwen| Qwen/Qwen-7B-Chat | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
37 |
| -|Qwen| Qwen/Qwen2-7B | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
38 |
| -|LLaVA| liuhaotian/llava-v1.5-7b | 🟩 | 🟩 | | 🟩 | 🟩 | |
39 |
| -|GIT| microsoft/git-base | 🟩 | 🟩 | | 🟩 | | |
40 |
| -|Yuan| IEITYuan/Yuan2-102B-hf | 🟩 | 🟩 | | 🟩 | | |
41 |
| -|Phi| microsoft/phi-2 | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
42 |
| -|Phi| microsoft/Phi-3-mini-4k-instruct | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
43 |
| -|Phi| microsoft/Phi-3-mini-128k-instruct | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
44 |
| -|Phi| microsoft/Phi-3-medium-4k-instruct | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
45 |
| -|Phi| microsoft/Phi-3-medium-128k-instruct | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
46 |
| -|Whisper| openai/whisper-large-v2 | 🟩 | 🟩 | 🟩 | 🟩 | | |
47 |
| -|Maira| microsoft/maira-2 | 🟩 | 🟩 | | 🟩 | | |
48 |
| -|Jamba| ai21labs/Jamba-v0.1 | 🟩 | 🟩 | | 🟩 | | |
| 7 | +|LLAMA| meta-llama/Llama-2-7b-hf | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 8 | +|LLAMA| meta-llama/Llama-2-13b-hf | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 9 | +|LLAMA| meta-llama/Llama-2-70b-hf | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 10 | +|LLAMA| meta-llama/Meta-Llama-3-8B | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 11 | +|LLAMA| meta-llama/Meta-Llama-3-70B | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 12 | +|LLAMA| meta-llama/Meta-Llama-3.1-8B-Instruct | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 13 | +|LLAMA| meta-llama/Llama-3.2-3B-Instruct | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 14 | +|LLAMA| meta-llama/Llama-3.2-11B-Vision-Instruct | ✅ | ✅ | | ✅ | ✅ | |
| 15 | +|GPT-J| EleutherAI/gpt-j-6b | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 16 | +|GPT-NEOX| EleutherAI/gpt-neox-20b | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 17 | +|DOLLY| databricks/dolly-v2-12b | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 18 | +|FALCON| tiiuae/falcon-7b | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 19 | +|FALCON| tiiuae/falcon-11b | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 20 | +|FALCON| tiiuae/falcon-40b | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 21 | +|OPT| facebook/opt-30b | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 22 | +|OPT| facebook/opt-1.3b | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 23 | +|Bloom| bigscience/bloom-1b7 | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 24 | +|CodeGen| Salesforce/codegen-2B-multi | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 25 | +|Baichuan| baichuan-inc/Baichuan2-7B-Chat | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 26 | +|Baichuan| baichuan-inc/Baichuan2-13B-Chat | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 27 | +|Baichuan| baichuan-inc/Baichuan-13B-Chat | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 28 | +|ChatGLM| THUDM/chatglm3-6b | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 29 | +|ChatGLM| THUDM/chatglm2-6b | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 30 | +|GPTBigCode| bigcode/starcoder | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 31 | +|T5| google/flan-t5-xl | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 32 | +|MPT| mosaicml/mpt-7b | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 33 | +|Mistral| mistralai/Mistral-7B-v0.1 | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 34 | +|Mixtral| mistralai/Mixtral-8x7B-v0.1 | ✅ | ✅ | | ✅ | ✅ | |
| 35 | +|Stablelm| stabilityai/stablelm-2-1_6b | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 36 | +|Qwen| Qwen/Qwen-7B-Chat | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 37 | +|Qwen| Qwen/Qwen2-7B | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 38 | +|LLaVA| liuhaotian/llava-v1.5-7b | ✅ | ✅ | | ✅ | ✅ | |
| 39 | +|GIT| microsoft/git-base | ✅ | ✅ | | ✅ | ✅ | |
| 40 | +|Yuan| IEITYuan/Yuan2-102B-hf | ✅ | ✅ | | ✅ | | |
| 41 | +|Phi| microsoft/phi-2 | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 42 | +|Phi| microsoft/Phi-3-mini-4k-instruct | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 43 | +|Phi| microsoft/Phi-3-mini-128k-instruct | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 44 | +|Phi| microsoft/Phi-3-medium-4k-instruct | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 45 | +|Phi| microsoft/Phi-3-medium-128k-instruct | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 46 | +|Whisper| openai/whisper-large-v2 | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 47 | +|Maira| microsoft/maira-2 | ✅ | ✅ | | ✅ | ✅ | |
| 48 | +|Jamba| ai21labs/Jamba-v0.1 | ✅ | ✅ | | ✅ | ✅ | |
| 49 | +|DeepSeek| deepseek-ai/DeepSeek-V2.5-1210 | ✅ | ✅ | | ✅ | ✅ | |
49 | 50 |
|
50 | 51 | ## 1.2 Verified for distributed inference mode via DeepSpeed
|
51 | 52 |
|
52 | 53 | | MODEL FAMILY | MODEL NAME (Huggingface hub) | BF16 | Weight only quantization INT8 |
|
53 | 54 | |:---:|:---:|:---:|:---:|
|
54 |
| -|LLAMA| meta-llama/Llama-2-7b-hf | 🟩 | 🟩 | |
55 |
| -|LLAMA| meta-llama/Llama-2-13b-hf | 🟩 | 🟩 | |
56 |
| -|LLAMA| meta-llama/Llama-2-70b-hf | 🟩 | 🟩 | |
57 |
| -|LLAMA| meta-llama/Meta-Llama-3-8B | 🟩 | 🟩 | |
58 |
| -|LLAMA| meta-llama/Meta-Llama-3-70B | 🟩 | 🟩 | |
59 |
| -|LLAMA| meta-llama/Meta-Llama-3.1-8B-Instruct | 🟩 | 🟩 | |
60 |
| -|LLAMA| meta-llama/Llama-3.2-3B-Instruct | 🟩 | 🟩 | |
61 |
| -|LLAMA| meta-llama/Llama-3.2-11B-Vision-Instruct | 🟩 | 🟩 | |
62 |
| -|GPT-J| EleutherAI/gpt-j-6b | 🟩 | 🟩 | |
63 |
| -|GPT-NEOX| EleutherAI/gpt-neox-20b | 🟩 | 🟩 | |
64 |
| -|DOLLY| databricks/dolly-v2-12b | 🟩 | 🟩 | |
65 |
| -|FALCON| tiiuae/falcon-11b | 🟩 | 🟩 | |
66 |
| -|FALCON| tiiuae/falcon-40b | 🟩 | 🟩 | |
67 |
| -|OPT| facebook/opt-30b | 🟩 | 🟩 | |
68 |
| -|OPT| facebook/opt-1.3b | 🟩 | 🟩 | |
69 |
| -|Bloom| bigscience/bloom-1b7 | 🟩 | 🟩 | |
70 |
| -|CodeGen| Salesforce/codegen-2B-multi | 🟩 | 🟩 | |
71 |
| -|Baichuan| baichuan-inc/Baichuan2-7B-Chat | 🟩 | 🟩 | |
72 |
| -|Baichuan| baichuan-inc/Baichuan2-13B-Chat | 🟩 | 🟩 | |
73 |
| -|Baichuan| baichuan-inc/Baichuan-13B-Chat | 🟩 | 🟩 | |
74 |
| -|GPTBigCode| bigcode/starcoder | 🟩 | 🟩 | |
75 |
| -|T5| google/flan-t5-xl | 🟩 | 🟩 | |
76 |
| -|Mistral| mistralai/Mistral-7B-v0.1 | 🟩 | 🟩 | |
77 |
| -|Mistral| mistralai/Mixtral-8x7B-v0.1 | 🟩 | 🟩 | |
78 |
| -|MPT| mosaicml/mpt-7b | 🟩 | 🟩 | |
79 |
| -|Stablelm| stabilityai/stablelm-2-1_6b | 🟩 | 🟩 | |
80 |
| -|Qwen| Qwen/Qwen-7B-Chat | 🟩 | 🟩 | |
81 |
| -|Qwen| Qwen/Qwen2-7B | 🟩 | 🟩 | |
82 |
| -|GIT| microsoft/git-base | 🟩 | 🟩 | |
83 |
| -|Phi| microsoft/phi-2 | 🟩 | 🟩 | |
84 |
| -|Phi| microsoft/Phi-3-mini-4k-instruct | 🟩 | 🟩 | |
85 |
| -|Phi| microsoft/Phi-3-mini-128k-instruct | 🟩 | 🟩 | |
86 |
| -|Phi| microsoft/Phi-3-medium-4k-instruct | 🟩 | 🟩 | |
87 |
| -|Phi| microsoft/Phi-3-medium-128k-instruct | 🟩 | 🟩 | |
88 |
| -|Whisper| openai/whisper-large-v2 | 🟩 | 🟩 | |
| 55 | +|LLAMA| meta-llama/Llama-2-7b-hf | ✅ | ✅ | |
| 56 | +|LLAMA| meta-llama/Llama-2-13b-hf | ✅ | ✅ | |
| 57 | +|LLAMA| meta-llama/Llama-2-70b-hf | ✅ | ✅ | |
| 58 | +|LLAMA| meta-llama/Meta-Llama-3-8B | ✅ | ✅ | |
| 59 | +|LLAMA| meta-llama/Meta-Llama-3-70B | ✅ | ✅ | |
| 60 | +|LLAMA| meta-llama/Meta-Llama-3.1-8B-Instruct | ✅ | ✅ | |
| 61 | +|LLAMA| meta-llama/Llama-3.2-3B-Instruct | ✅ | ✅ | |
| 62 | +|LLAMA| meta-llama/Llama-3.2-11B-Vision-Instruct | ✅ | ✅ | |
| 63 | +|GPT-J| EleutherAI/gpt-j-6b | ✅ | ✅ | |
| 64 | +|GPT-NEOX| EleutherAI/gpt-neox-20b | ✅ | ✅ | |
| 65 | +|DOLLY| databricks/dolly-v2-12b | ✅ | ✅ | |
| 66 | +|FALCON| tiiuae/falcon-11b | ✅ | ✅ | |
| 67 | +|FALCON| tiiuae/falcon-40b | ✅ | ✅ | |
| 68 | +|OPT| facebook/opt-30b | ✅ | ✅ | |
| 69 | +|OPT| facebook/opt-1.3b | ✅ | ✅ | |
| 70 | +|Bloom| bigscience/bloom-1b7 | ✅ | ✅ | |
| 71 | +|CodeGen| Salesforce/codegen-2B-multi | ✅ | ✅ | |
| 72 | +|Baichuan| baichuan-inc/Baichuan2-7B-Chat | ✅ | ✅ | |
| 73 | +|Baichuan| baichuan-inc/Baichuan2-13B-Chat | ✅ | ✅ | |
| 74 | +|Baichuan| baichuan-inc/Baichuan-13B-Chat | ✅ | ✅ | |
| 75 | +|GPTBigCode| bigcode/starcoder | ✅ | ✅ | |
| 76 | +|T5| google/flan-t5-xl | ✅ | ✅ | |
| 77 | +|Mistral| mistralai/Mistral-7B-v0.1 | ✅ | ✅ | |
| 78 | +|Mistral| mistralai/Mixtral-8x7B-v0.1 | ✅ | ✅ | |
| 79 | +|MPT| mosaicml/mpt-7b | ✅ | ✅ | |
| 80 | +|Stablelm| stabilityai/stablelm-2-1_6b | ✅ | ✅ | |
| 81 | +|Qwen| Qwen/Qwen-7B-Chat | ✅ | ✅ | |
| 82 | +|Qwen| Qwen/Qwen2-7B | ✅ | ✅ | |
| 83 | +|GIT| microsoft/git-base | ✅ | ✅ | |
| 84 | +|Phi| microsoft/phi-2 | ✅ | ✅ | |
| 85 | +|Phi| microsoft/Phi-3-mini-4k-instruct | ✅ | ✅ | |
| 86 | +|Phi| microsoft/Phi-3-mini-128k-instruct | ✅ | ✅ | |
| 87 | +|Phi| microsoft/Phi-3-medium-4k-instruct | ✅ | ✅ | |
| 88 | +|Phi| microsoft/Phi-3-medium-128k-instruct | ✅ | ✅ | |
| 89 | +|Whisper| openai/whisper-large-v2 | ✅ | ✅ | |
89 | 90 |
|
90 | 91 | *Note*: The above verified models (including other models in the same model family, like "codellama/CodeLlama-7b-hf" from LLAMA family)
|
91 | 92 | are well supported with all optimizations like indirect access KV cache, fused ROPE, and customized linear kernels.
|
|
0 commit comments