update LLM support models (#3489)

ZailiWang · chunyuan-w · web-flow · commit 4784d0d10654 · 2025-02-07T16:14:30.000+08:00
* update LLM support models

* correct src build argument

* remove cpuid dependency

---------

Co-authored-by: Chunyuan WU &lt;chunyuan.wu@intel.com&gt;
diff --git a/README.md b/README.md
@@ -18,46 +18,49 @@ In the current technological landscape, Generative AI (GenAI) workloads and mode
 
 | MODEL FAMILY | MODEL NAME (Huggingface hub) | FP32 | BF16 | Static quantization INT8 | Weight only quantization INT8 | Weight only quantization INT4 |
 |:---:|:---:|:---:|:---:|:---:|:---:|:---:|
-|LLAMA| meta-llama/Llama-2-7b-hf | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|LLAMA| meta-llama/Llama-2-13b-hf | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|LLAMA| meta-llama/Llama-2-70b-hf | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|LLAMA| meta-llama/Meta-Llama-3-8B | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|LLAMA| meta-llama/Meta-Llama-3-70B | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|LLAMA| meta-llama/Meta-Llama-3.1-8B-Instruct | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|LLAMA| meta-llama/Llama-3.2-3B-Instruct | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|LLAMA| meta-llama/Llama-3.2-11B-Vision-Instruct | 🟩 | 🟩 |   | 🟩 |   |
-|GPT-J| EleutherAI/gpt-j-6b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|GPT-NEOX| EleutherAI/gpt-neox-20b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|DOLLY| databricks/dolly-v2-12b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|FALCON| tiiuae/falcon-7b  | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|FALCON| tiiuae/falcon-11b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|FALCON| tiiuae/falcon-40b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|OPT| facebook/opt-30b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|OPT| facebook/opt-1.3b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|Bloom| bigscience/bloom-1b7 | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|CodeGen| Salesforce/codegen-2B-multi | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|Baichuan| baichuan-inc/Baichuan2-7B-Chat | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|Baichuan| baichuan-inc/Baichuan2-13B-Chat | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|Baichuan| baichuan-inc/Baichuan-13B-Chat | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|ChatGLM| THUDM/chatglm3-6b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|ChatGLM| THUDM/chatglm2-6b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|GPTBigCode| bigcode/starcoder | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|T5| google/flan-t5-xl | 🟩 | 🟩 | 🟩 | 🟩 |   |
-|MPT| mosaicml/mpt-7b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|Mistral| mistralai/Mistral-7B-v0.1 | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|Mixtral| mistralai/Mixtral-8x7B-v0.1 | 🟩 | 🟩 |   | 🟩 | 🟩 |
-|Stablelm| stabilityai/stablelm-2-1_6b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|Qwen| Qwen/Qwen-7B-Chat | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|Qwen| Qwen/Qwen2-7B | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|LLaVA| liuhaotian/llava-v1.5-7b | 🟩 | 🟩 |   | 🟩 | 🟩 |
-|GIT| microsoft/git-base | 🟩 | 🟩 |   | 🟩 |   |
-|Yuan| IEITYuan/Yuan2-102B-hf | 🟩 | 🟩 |   | 🟩 |   |
-|Phi| microsoft/phi-2 | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|Phi| microsoft/Phi-3-mini-4k-instruct | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|Phi| microsoft/Phi-3-mini-128k-instruct | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|Phi| microsoft/Phi-3-medium-4k-instruct | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|Phi| microsoft/Phi-3-medium-128k-instruct | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|Whisper| openai/whisper-large-v2 | 🟩 | 🟩 | 🟩 | 🟩 |   |
+|LLAMA| meta-llama/Llama-2-7b-hf | ✅ | ✅ | ✅ | ✅ | ✅ |
+|LLAMA| meta-llama/Llama-2-13b-hf | ✅ | ✅ | ✅ | ✅ | ✅ |
+|LLAMA| meta-llama/Llama-2-70b-hf | ✅ | ✅ | ✅ | ✅ | ✅ |
+|LLAMA| meta-llama/Meta-Llama-3-8B | ✅ | ✅ | ✅ | ✅ | ✅ |
+|LLAMA| meta-llama/Meta-Llama-3-70B | ✅ | ✅ | ✅ | ✅ | ✅ |
+|LLAMA| meta-llama/Meta-Llama-3.1-8B-Instruct | ✅ | ✅ | ✅ | ✅ | ✅ |
+|LLAMA| meta-llama/Llama-3.2-3B-Instruct | ✅ | ✅ | ✅ | ✅ | ✅ |
+|LLAMA| meta-llama/Llama-3.2-11B-Vision-Instruct | ✅ | ✅ |   | ✅ | ✅ |
+|GPT-J| EleutherAI/gpt-j-6b | ✅ | ✅ | ✅ | ✅ | ✅ |
+|GPT-NEOX| EleutherAI/gpt-neox-20b | ✅ | ✅ | ✅ | ✅ | ✅ |
+|DOLLY| databricks/dolly-v2-12b | ✅ | ✅ | ✅ | ✅ | ✅ |
+|FALCON| tiiuae/falcon-7b  | ✅ | ✅ | ✅ | ✅ | ✅ |
+|FALCON| tiiuae/falcon-11b | ✅ | ✅ | ✅ | ✅ | ✅ |
+|FALCON| tiiuae/falcon-40b | ✅ | ✅ | ✅ | ✅ | ✅ |
+|OPT| facebook/opt-30b | ✅ | ✅ | ✅ | ✅ | ✅ |
+|OPT| facebook/opt-1.3b | ✅ | ✅ | ✅ | ✅ | ✅ |
+|Bloom| bigscience/bloom-1b7 | ✅ | ✅ | ✅ | ✅ | ✅ |
+|CodeGen| Salesforce/codegen-2B-multi | ✅ | ✅ | ✅ | ✅ | ✅ |
+|Baichuan| baichuan-inc/Baichuan2-7B-Chat | ✅ | ✅ | ✅ | ✅ | ✅ |
+|Baichuan| baichuan-inc/Baichuan2-13B-Chat | ✅ | ✅ | ✅ | ✅ | ✅ |
+|Baichuan| baichuan-inc/Baichuan-13B-Chat | ✅ | ✅ | ✅ | ✅ | ✅ |
+|ChatGLM| THUDM/chatglm3-6b | ✅ | ✅ | ✅ | ✅ | ✅ |
+|ChatGLM| THUDM/chatglm2-6b | ✅ | ✅ | ✅ | ✅ | ✅ |
+|GPTBigCode| bigcode/starcoder | ✅ | ✅ | ✅ | ✅ | ✅ |
+|T5| google/flan-t5-xl | ✅ | ✅ | ✅ | ✅ | ✅ |
+|MPT| mosaicml/mpt-7b | ✅ | ✅ | ✅ | ✅ | ✅ |
+|Mistral| mistralai/Mistral-7B-v0.1 | ✅ | ✅ | ✅ | ✅ | ✅ |
+|Mixtral| mistralai/Mixtral-8x7B-v0.1 | ✅ | ✅ |   | ✅ | ✅ |
+|Stablelm| stabilityai/stablelm-2-1_6b | ✅ | ✅ | ✅ | ✅ | ✅ |
+|Qwen| Qwen/Qwen-7B-Chat | ✅ | ✅ | ✅ | ✅ | ✅ |
+|Qwen| Qwen/Qwen2-7B | ✅ | ✅ | ✅ | ✅ | ✅ |
+|LLaVA| liuhaotian/llava-v1.5-7b | ✅ | ✅ |   | ✅ | ✅ |
+|GIT| microsoft/git-base | ✅ | ✅ |   | ✅ | ✅ |
+|Yuan| IEITYuan/Yuan2-102B-hf | ✅ | ✅ |   | ✅ |   |
+|Phi| microsoft/phi-2 | ✅ | ✅ | ✅ | ✅ | ✅ |
+|Phi| microsoft/Phi-3-mini-4k-instruct | ✅ | ✅ | ✅ | ✅ | ✅ |
+|Phi| microsoft/Phi-3-mini-128k-instruct | ✅ | ✅ | ✅ | ✅ | ✅ |
+|Phi| microsoft/Phi-3-medium-4k-instruct | ✅ | ✅ | ✅ | ✅ | ✅ |
+|Phi| microsoft/Phi-3-medium-128k-instruct | ✅ | ✅ | ✅ | ✅ | ✅ |
+|Whisper| openai/whisper-large-v2 | ✅ | ✅ | ✅ | ✅ | ✅ |
+|Maira| microsoft/maira-2 | ✅ | ✅ |   | ✅ | ✅ |
+|Jamba| ai21labs/Jamba-v0.1 | ✅ | ✅ |   | ✅ | ✅ |
+|DeepSeek| deepseek-ai/DeepSeek-V2.5-1210 | ✅ | ✅ |   | ✅ | ✅ |
 
 *Note*: The above verified models (including other models in the same model family, like "codellama/CodeLlama-7b-hf" from LLAMA family) are well supported with all optimizations like indirect access KV cache, fused ROPE, and customized linear kernels.
 We are working in progress to better support the models in the tables with various data types. In addition, more models will be optimized in the future.
diff --git a/docs/tutorials/examples.md b/docs/tutorials/examples.md
@@ -314,6 +314,6 @@ $ ldd example-app
 
 ## Intel® AI Reference Models
 
-Use cases that have already been optimized by Intel engineers are available at [Intel® AI Reference Models](https://github.com/intel/ai-reference-models/tree/pytorch-r2.5-models) (former Model Zoo).
-The lists of PyTorch use cases with links to sample codes are available in the [use case tables](https://github.com/intel/ai-reference-models/tree/pytorch-r2.5-models?tab=readme-ov-file#use-cases).
+Use cases that have already been optimized by Intel engineers are available at [Intel® AI Reference Models](https://github.com/intel/ai-reference-models/tree/pytorch-r2.6-models) (former Model Zoo).
+The lists of PyTorch use cases with links to sample codes are available in the [use case tables](https://github.com/intel/ai-reference-models/tree/pytorch-r2.6-models?tab=readme-ov-file#use-cases).
 You can get performance benefits out-of-the-box by simply running scripts in the Intel® AI Reference Models.
diff --git a/examples/cpu/llm/README.md b/examples/cpu/llm/README.md
@@ -102,7 +102,7 @@ conda activate llm
 
 # Setup the environment with the provided script
 cd examples/cpu/llm
-bash ./tools/env_setup.sh 8
+bash ./tools/env_setup.sh 11
 
 # Activate environment variables
 # set bash script argument to "inference" or "fine-tuning" for different usages
diff --git a/examples/cpu/llm/inference/README.md b/examples/cpu/llm/inference/README.md
@@ -4,88 +4,89 @@
 
 | MODEL FAMILY | MODEL NAME (Huggingface hub) | FP32 | BF16 | Static quantization INT8 | Weight only quantization INT8 | Weight only quantization INT4 |
 |:---:|:---:|:---:|:---:|:---:|:---:|:---:|
-|LLAMA| meta-llama/Llama-2-7b-hf | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|LLAMA| meta-llama/Llama-2-13b-hf | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|LLAMA| meta-llama/Llama-2-70b-hf | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|LLAMA| meta-llama/Meta-Llama-3-8B | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|LLAMA| meta-llama/Meta-Llama-3-70B | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|LLAMA| meta-llama/Meta-Llama-3.1-8B-Instruct | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|LLAMA| meta-llama/Llama-3.2-3B-Instruct | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|LLAMA| meta-llama/Llama-3.2-11B-Vision-Instruct | 🟩 | 🟩 |   | 🟩 |   |
-|GPT-J| EleutherAI/gpt-j-6b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|GPT-NEOX| EleutherAI/gpt-neox-20b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|DOLLY| databricks/dolly-v2-12b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|FALCON| tiiuae/falcon-7b  | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|FALCON| tiiuae/falcon-11b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|FALCON| tiiuae/falcon-40b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|OPT| facebook/opt-30b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|OPT| facebook/opt-1.3b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|Bloom| bigscience/bloom-1b7 | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|CodeGen| Salesforce/codegen-2B-multi | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|Baichuan| baichuan-inc/Baichuan2-7B-Chat | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|Baichuan| baichuan-inc/Baichuan2-13B-Chat | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|Baichuan| baichuan-inc/Baichuan-13B-Chat | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|ChatGLM| THUDM/chatglm3-6b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|ChatGLM| THUDM/chatglm2-6b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|GPTBigCode| bigcode/starcoder | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|T5| google/flan-t5-xl | 🟩 | 🟩 | 🟩 | 🟩 |   |
-|MPT| mosaicml/mpt-7b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|Mistral| mistralai/Mistral-7B-v0.1 | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|Mixtral| mistralai/Mixtral-8x7B-v0.1 | 🟩 | 🟩 |   | 🟩 | 🟩 |
-|Stablelm| stabilityai/stablelm-2-1_6b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|Qwen| Qwen/Qwen-7B-Chat | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|Qwen| Qwen/Qwen2-7B | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|LLaVA| liuhaotian/llava-v1.5-7b | 🟩 | 🟩 |   | 🟩 | 🟩 |
-|GIT| microsoft/git-base | 🟩 | 🟩 |   | 🟩 |   |
-|Yuan| IEITYuan/Yuan2-102B-hf | 🟩 | 🟩 |   | 🟩 |   |
-|Phi| microsoft/phi-2 | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|Phi| microsoft/Phi-3-mini-4k-instruct | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|Phi| microsoft/Phi-3-mini-128k-instruct | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|Phi| microsoft/Phi-3-medium-4k-instruct | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|Phi| microsoft/Phi-3-medium-128k-instruct | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
-|Whisper| openai/whisper-large-v2 | 🟩 | 🟩 | 🟩 | 🟩 |   |
-|Maira| microsoft/maira-2 | 🟩 | 🟩 |   | 🟩 |   |
-|Jamba| ai21labs/Jamba-v0.1 | 🟩 | 🟩 |   | 🟩 |   |
+|LLAMA| meta-llama/Llama-2-7b-hf | ✅ | ✅ | ✅ | ✅ | ✅ |
+|LLAMA| meta-llama/Llama-2-13b-hf | ✅ | ✅ | ✅ | ✅ | ✅ |
+|LLAMA| meta-llama/Llama-2-70b-hf | ✅ | ✅ | ✅ | ✅ | ✅ |
+|LLAMA| meta-llama/Meta-Llama-3-8B | ✅ | ✅ | ✅ | ✅ | ✅ |
+|LLAMA| meta-llama/Meta-Llama-3-70B | ✅ | ✅ | ✅ | ✅ | ✅ |
+|LLAMA| meta-llama/Meta-Llama-3.1-8B-Instruct | ✅ | ✅ | ✅ | ✅ | ✅ |
+|LLAMA| meta-llama/Llama-3.2-3B-Instruct | ✅ | ✅ | ✅ | ✅ | ✅ |
+|LLAMA| meta-llama/Llama-3.2-11B-Vision-Instruct | ✅ | ✅ |   | ✅ | ✅ |
+|GPT-J| EleutherAI/gpt-j-6b | ✅ | ✅ | ✅ | ✅ | ✅ |
+|GPT-NEOX| EleutherAI/gpt-neox-20b | ✅ | ✅ | ✅ | ✅ | ✅ |
+|DOLLY| databricks/dolly-v2-12b | ✅ | ✅ | ✅ | ✅ | ✅ |
+|FALCON| tiiuae/falcon-7b  | ✅ | ✅ | ✅ | ✅ | ✅ |
+|FALCON| tiiuae/falcon-11b | ✅ | ✅ | ✅ | ✅ | ✅ |
+|FALCON| tiiuae/falcon-40b | ✅ | ✅ | ✅ | ✅ | ✅ |
+|OPT| facebook/opt-30b | ✅ | ✅ | ✅ | ✅ | ✅ |
+|OPT| facebook/opt-1.3b | ✅ | ✅ | ✅ | ✅ | ✅ |
+|Bloom| bigscience/bloom-1b7 | ✅ | ✅ | ✅ | ✅ | ✅ |
+|CodeGen| Salesforce/codegen-2B-multi | ✅ | ✅ | ✅ | ✅ | ✅ |
+|Baichuan| baichuan-inc/Baichuan2-7B-Chat | ✅ | ✅ | ✅ | ✅ | ✅ |
+|Baichuan| baichuan-inc/Baichuan2-13B-Chat | ✅ | ✅ | ✅ | ✅ | ✅ |
+|Baichuan| baichuan-inc/Baichuan-13B-Chat | ✅ | ✅ | ✅ | ✅ | ✅ |
+|ChatGLM| THUDM/chatglm3-6b | ✅ | ✅ | ✅ | ✅ | ✅ |
+|ChatGLM| THUDM/chatglm2-6b | ✅ | ✅ | ✅ | ✅ | ✅ |
+|GPTBigCode| bigcode/starcoder | ✅ | ✅ | ✅ | ✅ | ✅ |
+|T5| google/flan-t5-xl | ✅ | ✅ | ✅ | ✅ | ✅ |
+|MPT| mosaicml/mpt-7b | ✅ | ✅ | ✅ | ✅ | ✅ |
+|Mistral| mistralai/Mistral-7B-v0.1 | ✅ | ✅ | ✅ | ✅ | ✅ |
+|Mixtral| mistralai/Mixtral-8x7B-v0.1 | ✅ | ✅ |   | ✅ | ✅ |
+|Stablelm| stabilityai/stablelm-2-1_6b | ✅ | ✅ | ✅ | ✅ | ✅ |
+|Qwen| Qwen/Qwen-7B-Chat | ✅ | ✅ | ✅ | ✅ | ✅ |
+|Qwen| Qwen/Qwen2-7B | ✅ | ✅ | ✅ | ✅ | ✅ |
+|LLaVA| liuhaotian/llava-v1.5-7b | ✅ | ✅ |   | ✅ | ✅ |
+|GIT| microsoft/git-base | ✅ | ✅ |   | ✅ | ✅ |
+|Yuan| IEITYuan/Yuan2-102B-hf | ✅ | ✅ |   | ✅ |   |
+|Phi| microsoft/phi-2 | ✅ | ✅ | ✅ | ✅ | ✅ |
+|Phi| microsoft/Phi-3-mini-4k-instruct | ✅ | ✅ | ✅ | ✅ | ✅ |
+|Phi| microsoft/Phi-3-mini-128k-instruct | ✅ | ✅ | ✅ | ✅ | ✅ |
+|Phi| microsoft/Phi-3-medium-4k-instruct | ✅ | ✅ | ✅ | ✅ | ✅ |
+|Phi| microsoft/Phi-3-medium-128k-instruct | ✅ | ✅ | ✅ | ✅ | ✅ |
+|Whisper| openai/whisper-large-v2 | ✅ | ✅ | ✅ | ✅ | ✅ |
+|Maira| microsoft/maira-2 | ✅ | ✅ |   | ✅ | ✅ |
+|Jamba| ai21labs/Jamba-v0.1 | ✅ | ✅ |   | ✅ | ✅ |
+|DeepSeek| deepseek-ai/DeepSeek-V2.5-1210 | ✅ | ✅ |   | ✅ | ✅ |
 
 ## 1.2 Verified for distributed inference mode via DeepSpeed
 
 | MODEL FAMILY | MODEL NAME (Huggingface hub) | BF16 | Weight only quantization INT8 |
 |:---:|:---:|:---:|:---:|
-|LLAMA| meta-llama/Llama-2-7b-hf | 🟩 | 🟩 |
-|LLAMA| meta-llama/Llama-2-13b-hf | 🟩 | 🟩 |
-|LLAMA| meta-llama/Llama-2-70b-hf | 🟩 | 🟩 |
-|LLAMA| meta-llama/Meta-Llama-3-8B | 🟩 | 🟩 |
-|LLAMA| meta-llama/Meta-Llama-3-70B | 🟩 | 🟩 |
-|LLAMA| meta-llama/Meta-Llama-3.1-8B-Instruct | 🟩 | 🟩 |
-|LLAMA| meta-llama/Llama-3.2-3B-Instruct | 🟩 | 🟩 |
-|LLAMA| meta-llama/Llama-3.2-11B-Vision-Instruct | 🟩 | 🟩 |
-|GPT-J| EleutherAI/gpt-j-6b | 🟩 | 🟩 |
-|GPT-NEOX| EleutherAI/gpt-neox-20b | 🟩 | 🟩 |
-|DOLLY| databricks/dolly-v2-12b | 🟩 | 🟩 |
-|FALCON| tiiuae/falcon-11b | 🟩 | 🟩 |
-|FALCON| tiiuae/falcon-40b | 🟩 | 🟩 |
-|OPT| facebook/opt-30b | 🟩 | 🟩 |
-|OPT| facebook/opt-1.3b | 🟩 | 🟩 |
-|Bloom| bigscience/bloom-1b7 | 🟩 | 🟩 |
-|CodeGen| Salesforce/codegen-2B-multi |  🟩 | 🟩 |
-|Baichuan| baichuan-inc/Baichuan2-7B-Chat | 🟩 | 🟩 |
-|Baichuan| baichuan-inc/Baichuan2-13B-Chat | 🟩 | 🟩 |
-|Baichuan| baichuan-inc/Baichuan-13B-Chat | 🟩 | 🟩 |
-|GPTBigCode| bigcode/starcoder | 🟩 | 🟩 |
-|T5| google/flan-t5-xl | 🟩 | 🟩 |
-|Mistral| mistralai/Mistral-7B-v0.1 | 🟩 | 🟩 |
-|Mistral| mistralai/Mixtral-8x7B-v0.1 | 🟩 | 🟩 |
-|MPT| mosaicml/mpt-7b | 🟩 | 🟩 |
-|Stablelm| stabilityai/stablelm-2-1_6b | 🟩 | 🟩 |
-|Qwen| Qwen/Qwen-7B-Chat | 🟩 | 🟩 |
-|Qwen| Qwen/Qwen2-7B | 🟩 | 🟩 |
-|GIT| microsoft/git-base | 🟩 | 🟩 |
-|Phi| microsoft/phi-2 | 🟩 | 🟩 |
-|Phi| microsoft/Phi-3-mini-4k-instruct | 🟩 | 🟩 |
-|Phi| microsoft/Phi-3-mini-128k-instruct | 🟩 | 🟩 |
-|Phi| microsoft/Phi-3-medium-4k-instruct | 🟩 | 🟩 |
-|Phi| microsoft/Phi-3-medium-128k-instruct | 🟩 | 🟩 |
-|Whisper| openai/whisper-large-v2 | 🟩 | 🟩 |
+|LLAMA| meta-llama/Llama-2-7b-hf | ✅ | ✅ |
+|LLAMA| meta-llama/Llama-2-13b-hf | ✅ | ✅ |
+|LLAMA| meta-llama/Llama-2-70b-hf | ✅ | ✅ |
+|LLAMA| meta-llama/Meta-Llama-3-8B | ✅ | ✅ |
+|LLAMA| meta-llama/Meta-Llama-3-70B | ✅ | ✅ |
+|LLAMA| meta-llama/Meta-Llama-3.1-8B-Instruct | ✅ | ✅ |
+|LLAMA| meta-llama/Llama-3.2-3B-Instruct | ✅ | ✅ |
+|LLAMA| meta-llama/Llama-3.2-11B-Vision-Instruct | ✅ | ✅ |
+|GPT-J| EleutherAI/gpt-j-6b | ✅ | ✅ |
+|GPT-NEOX| EleutherAI/gpt-neox-20b | ✅ | ✅ |
+|DOLLY| databricks/dolly-v2-12b | ✅ | ✅ |
+|FALCON| tiiuae/falcon-11b | ✅ | ✅ |
+|FALCON| tiiuae/falcon-40b | ✅ | ✅ |
+|OPT| facebook/opt-30b | ✅ | ✅ |
+|OPT| facebook/opt-1.3b | ✅ | ✅ |
+|Bloom| bigscience/bloom-1b7 | ✅ | ✅ |
+|CodeGen| Salesforce/codegen-2B-multi | ✅ | ✅ |
+|Baichuan| baichuan-inc/Baichuan2-7B-Chat | ✅ | ✅ |
+|Baichuan| baichuan-inc/Baichuan2-13B-Chat | ✅ | ✅ |
+|Baichuan| baichuan-inc/Baichuan-13B-Chat | ✅ | ✅ |
+|GPTBigCode| bigcode/starcoder | ✅ | ✅ |
+|T5| google/flan-t5-xl | ✅ | ✅ |
+|Mistral| mistralai/Mistral-7B-v0.1 | ✅ | ✅ |
+|Mistral| mistralai/Mixtral-8x7B-v0.1 | ✅ | ✅ |
+|MPT| mosaicml/mpt-7b | ✅ | ✅ |
+|Stablelm| stabilityai/stablelm-2-1_6b | ✅ | ✅ |
+|Qwen| Qwen/Qwen-7B-Chat | ✅ | ✅ |
+|Qwen| Qwen/Qwen2-7B | ✅ | ✅ |
+|GIT| microsoft/git-base | ✅ | ✅ |
+|Phi| microsoft/phi-2 | ✅ | ✅ |
+|Phi| microsoft/Phi-3-mini-4k-instruct | ✅ | ✅ |
+|Phi| microsoft/Phi-3-mini-128k-instruct | ✅ | ✅ |
+|Phi| microsoft/Phi-3-medium-4k-instruct | ✅ | ✅ |
+|Phi| microsoft/Phi-3-medium-128k-instruct | ✅ | ✅ |
+|Whisper| openai/whisper-large-v2 | ✅ | ✅ |
 
 *Note*: The above verified models (including other models in the same model family, like "codellama/CodeLlama-7b-hf" from LLAMA family)
 are well supported with all optimizations like indirect access KV cache, fused ROPE, and customized linear kernels.
diff --git a/examples/cpu/llm/requirements.txt b/examples/cpu/llm/requirements.txt
@@ -1,4 +1,3 @@
-cpuid
 accelerate
 datasets==2.21.0
 sentencepiece

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,3 @@`
`1`		`-cpuid`
`2`	`1`	`accelerate`
`3`	`2`	`datasets==2.21.0`
`4`	`3`	`sentencepiece`