|
3 | 3 | The purpose of this example is to demonstrate a minimal usage of llama.cpp for running models.
|
4 | 4 |
|
5 | 5 | ```bash
|
6 |
| -llama-run granite3-moe |
| 6 | +llama-run -hf llama.cpp/example/run |
7 | 7 | ```
|
8 | 8 |
|
9 | 9 | ```bash
|
10 |
| -Description: |
11 |
| - Runs a llm |
| 10 | +Usage: build/bin/llama-run [server-options] |
12 | 11 |
|
13 |
| -Usage: |
14 |
| - llama-run [options] model [prompt] |
| 12 | +This tool starts a llama-server process and provides an interactive chat interface. |
| 13 | +All options except --port are passed through to llama-server. |
15 | 14 |
|
16 |
| -Options: |
17 |
| - -c, --context-size <value> |
18 |
| - Context size (default: 2048) |
19 |
| - -n, -ngl, --ngl <value> |
20 |
| - Number of GPU layers (default: 0) |
21 |
| - --temp <value> |
22 |
| - Temperature (default: 0.8) |
23 |
| - -v, --verbose, --log-verbose |
24 |
| - Set verbosity level to infinity (i.e. log all messages, useful for debugging) |
25 |
| - -h, --help |
26 |
| - Show help message |
| 15 | +Common options: |
| 16 | + -h, --help Show this help |
| 17 | + -m, --model FNAME model path (default: `models/$filename` with filename from `--hf-file` |
| 18 | + or `--model-url` if set, otherwise models/7B/ggml-model-f16.gguf) |
| 19 | + -hf, -hfr, --hf-repo <user>/<model>[:quant] |
| 20 | + Hugging Face model repository; quant is optional, case-insensitive, |
| 21 | + default to Q4_K_M, or falls back to the first file in the repo if |
| 22 | + Q4_K_M doesn't exist. |
| 23 | + mmproj is also downloaded automatically if available. to disable, add |
| 24 | + --no-mmproj |
| 25 | + example: unsloth/phi-4-GGUF:q4_k_m |
| 26 | + (default: unused) |
| 27 | + -c, --ctx-size N Context size |
| 28 | + -n, --predict N Number of tokens to predict |
| 29 | + -t, --threads N Number of threads |
27 | 30 |
|
28 |
| -Commands: |
29 |
| - model |
30 |
| - Model is a string with an optional prefix of |
31 |
| - huggingface:// (hf://), ollama://, https:// or file://. |
32 |
| - If no protocol is specified and a file exists in the specified |
33 |
| - path, file:// is assumed, otherwise if a file does not exist in |
34 |
| - the specified path, ollama:// is assumed. Models that are being |
35 |
| - pulled are downloaded with .partial extension while being |
36 |
| - downloaded and then renamed as the file without the .partial |
37 |
| - extension when complete. |
38 |
| - |
39 |
| -Examples: |
40 |
| - llama-run llama3 |
41 |
| - llama-run ollama://granite-code |
42 |
| - llama-run ollama://smollm:135m |
43 |
| - llama-run hf://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf |
44 |
| - llama-run huggingface://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf |
45 |
| - llama-run ms://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf |
46 |
| - llama-run modelscope://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf |
47 |
| - llama-run https://example.com/some-file1.gguf |
48 |
| - llama-run some-file2.gguf |
49 |
| - llama-run file://some-file3.gguf |
50 |
| - llama-run --ngl 999 some-file4.gguf |
51 |
| - llama-run --ngl 999 some-file5.gguf Hello World |
| 31 | +For all server options, run: llama-server --help |
52 | 32 | ```
|
0 commit comments