Skip to content

Commit e44da93

Browse files
authored
Format llama read me page
Differential Revision: D78847177 Pull Request resolved: #12782
1 parent c64a7fd commit e44da93

File tree

1 file changed

+9
-8
lines changed

1 file changed

+9
-8
lines changed

examples/models/llama/README.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -168,7 +168,7 @@ LLAMA_CHECKPOINT=path/to/consolidated.00.pth
168168
LLAMA_PARAMS=path/to/params.json
169169
170170
python -m extension.llm.export.export_llm \
171-
--config examples/models/llamaconfig/llama_bf16.yaml
171+
--config examples/models/llamaconfig/llama_bf16.yaml \
172172
+base.model_class="llama3_2" \
173173
+base.checkpoint="${LLAMA_CHECKPOINT:?}" \
174174
+base.params="${LLAMA_PARAMS:?}" \
@@ -186,7 +186,7 @@ LLAMA_QUANTIZED_CHECKPOINT=path/to/spinquant/consolidated.00.pth.pth
186186
LLAMA_PARAMS=path/to/spinquant/params.json
187187
188188
python -m extension.llm.export.export_llm \
189-
--config examples/models/llama/config/llama_xnnpack_spinquant.yaml
189+
--config examples/models/llama/config/llama_xnnpack_spinquant.yaml \
190190
+base.model_class="llama3_2" \
191191
+base.checkpoint="${LLAMA_QUANTIZED_CHECKPOINT:?}" \
192192
+base.params="${LLAMA_PARAMS:?}"
@@ -203,7 +203,7 @@ LLAMA_QUANTIZED_CHECKPOINT=path/to/qlora/consolidated.00.pth.pth
203203
LLAMA_PARAMS=path/to/qlora/params.json
204204
205205
python -m extension.llm.export.export_llm \
206-
--config examples/models/llama/config/llama_xnnpack_qat.yaml
206+
--config examples/models/llama/config/llama_xnnpack_qat.yaml \
207207
+base.model_class="llama3_2" \
208208
+base.checkpoint="${LLAMA_QUANTIZED_CHECKPOINT:?}" \
209209
+base.params="${LLAMA_PARAMS:?}" \
@@ -219,15 +219,16 @@ You can export and run the original Llama 3 8B instruct model.
219219
2. Export model and generate `.pte` file
220220
```
221221
python -m extension.llm.export.export_llm \
222-
--config examples/models/llama/config/llama_q8da4w.yaml
223-
+base.model_clas="llama3"
222+
--config examples/models/llama/config/llama_q8da4w.yaml \
223+
+base.model_class="llama3" \
224224
+base.checkpoint=<consolidated.00.pth.pth> \
225225
+base.params=<params.json>
226226
```
227-
Due to the larger vocabulary size of Llama 3, we recommend quantizing the embeddings with `quantization.embedding_quantize=\'4,32\'` as shown above to further reduce the model size.
228227

228+
Due to the larger vocabulary size of Llama 3, we recommend quantizing the embeddings with `quantization.embedding_quantize=\'4,32\'` as shown above to further reduce the model size.
229229

230-
If you're interested in deploying on non-CPU backends, [please refer the non-cpu-backend section](non_cpu_backends.md)
230+
231+
If you're interested in deploying on non-CPU backends, [please refer the non-cpu-backend section](non_cpu_backends.md)
231232

232233
## Step 3: Run on your computer to validate
233234

@@ -450,7 +451,7 @@ python -m examples.models.llama.eval_llama \
450451
-d <checkpoint dtype> \
451452
--tasks mmlu \
452453
--num_fewshot 5 \
453-
--max_seq_len <max sequence length>
454+
--max_seq_len <max sequence length> \
454455
--max_context_len <max context length>
455456
```
456457

0 commit comments

Comments
 (0)