Skip to content

Commit 6ea47af

Browse files
authored
Point out lambada_openai in doc (#2352)
* Update README.md * Update README.md * Update README.md
1 parent 998f2f7 commit 6ea47af

File tree

1 file changed

+19
-3
lines changed

1 file changed

+19
-3
lines changed

examples/cpu/inference/python/llm/README.md

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -375,14 +375,22 @@ Data type of scales can be any floating point types. Shape of scales should be [
375375

376376
We leverage [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) for the accuracy test.
377377

378-
By default we test "lambada_standard" task, for more choice, see {TASK_NAME} in this [link](https://github.com/EleutherAI/lm-evaluation-harness/blob/master/docs/task_table.md),
378+
We verify and recommend to test "lambada_openai" task, for more choice, see {TASK_NAME} in this [link](https://github.com/EleutherAI/lm-evaluation-harness/blob/master/docs/task_table.md).
379379

380380
### Single Instance
381381

382382
```bash
383383
cd ./single_instance
384384
```
385+
### FP32:
385386

387+
```bash
388+
# general command:
389+
OMP_NUM_THREADS=<physical cores num> numactl -m <node N> -C <physical cores list> python run_accuracy.py --accuracy-only -m <MODEL_ID> --dtype float32 --ipex --jit --tasks {TASK_NAME}
390+
391+
# An example of llama2 7b model:
392+
OMP_NUM_THREADS=56 numactl -m 0 -C 0-55 python run_accuracy.py --accuracy-only -m meta-llama/Llama-2-7b-hf --dtype float32 --ipex --jit --tasks lambada_openai
393+
```
386394
### BF16:
387395

388396
```bash
@@ -415,7 +423,15 @@ OMP_NUM_THREADS=56 numactl -m 0 -C 0-55 python run_accuracy.py -m meta-llama/Lla
415423
cd ./distributed
416424
unset KMP_AFFINITY
417425
```
426+
### FP32:
418427

428+
```bash
429+
# general command:
430+
deepspeed --num_gpus 2 --master_addr `hostname -I | sed -e 's/\s.*$//'` --bind_cores_to_rank run_accuracy_with_deepspeed.py --model <MODEL_ID> --dtype float32 --ipex --jit --tasks <TASK_NAME> --accuracy-only
431+
432+
# An example of llama2 7b model:
433+
deepspeed --num_gpus 2 --master_addr `hostname -I | sed -e 's/\s.*$//'` --bind_cores_to_rank run_accuracy_with_deepspeed.py --model meta-llama/Llama-2-7b-hf --dtype float32 --ipex --jit --tasks lambada_openai --accuracy-only
434+
```
419435
### BF16:
420436

421437
```bash
@@ -434,7 +450,7 @@ deepspeed --num_gpus 2 --master_addr `hostname -I | sed -e 's/\s.*$//'` --bind_
434450
# note that GPT-NEOX please remove "--int8-bf16-mixed" and add "--dtype float32" for accuracy concerns
435451

436452
# An example of llama2 7b model:
437-
deepspeed --num_gpus 2 --master_addr `hostname -I | sed -e 's/\s.*$//'` --bind_cores_to_rank run_accuracy_with_deepspeed.py --model meta-llama/Llama-2-7b-hf --int8-bf16-mixed --ipex --jit --tasks <TASK_NAME> --accuracy-only --ipex-weight-only-quantization
453+
deepspeed --num_gpus 2 --master_addr `hostname -I | sed -e 's/\s.*$//'` --bind_cores_to_rank run_accuracy_with_deepspeed.py --model meta-llama/Llama-2-7b-hf --int8-bf16-mixed --ipex --jit --tasks lambada_openai --accuracy-only --ipex-weight-only-quantization
438454
```
439455

440456
## How to Shard model for Distributed tests with DeepSpeed (autoTP)
@@ -457,4 +473,4 @@ python create_shard_model.py meta-llama/Llama-2-7b-hf --save-path ./local_llama2
457473

458474
(2) We can build up LLM services optimized by Intel® Extension for PyTorch\* with Triton Server. Please refer [here](../../../serving/triton/README.md) for best practice.
459475

460-
(3) The LLM inference methods introduced in this page can be well applied for AWS. We can just follow the above instructions and enjoy the boosted performance of LLM with Intel® Extension for PyTorch\* optimizations on the AWS instances.
476+
(3) The LLM inference methods introduced in this page can be well applied for AWS. We can just follow the above instructions and enjoy the boosted performance of LLM with Intel® Extension for PyTorch\* optimizations on the AWS instances.

0 commit comments

Comments
 (0)