Skip to content

Commit e629135

Browse files
committed
Rewrite llama-run to use llama-server
llama-run works fine, but falls well behind llama-server functionality. Integrate llama-server with llama-run. Signed-off-by: Eric Curtin <[email protected]>
1 parent a812838 commit e629135

File tree

2 files changed

+346
-1193
lines changed

2 files changed

+346
-1193
lines changed

tools/run/README.md

Lines changed: 20 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -3,50 +3,30 @@
33
The purpose of this example is to demonstrate a minimal usage of llama.cpp for running models.
44

55
```bash
6-
llama-run granite3-moe
6+
llama-run -hf llama.cpp/example/run
77
```
88

99
```bash
10-
Description:
11-
Runs a llm
10+
Usage: build/bin/llama-run [server-options]
1211

13-
Usage:
14-
llama-run [options] model [prompt]
12+
This tool starts a llama-server process and provides an interactive chat interface.
13+
All options except --port are passed through to llama-server.
1514

16-
Options:
17-
-c, --context-size <value>
18-
Context size (default: 2048)
19-
-n, -ngl, --ngl <value>
20-
Number of GPU layers (default: 0)
21-
--temp <value>
22-
Temperature (default: 0.8)
23-
-v, --verbose, --log-verbose
24-
Set verbosity level to infinity (i.e. log all messages, useful for debugging)
25-
-h, --help
26-
Show help message
15+
Common options:
16+
-h, --help Show this help
17+
-m, --model FNAME model path (default: `models/$filename` with filename from `--hf-file`
18+
or `--model-url` if set, otherwise models/7B/ggml-model-f16.gguf)
19+
-hf, -hfr, --hf-repo <user>/<model>[:quant]
20+
Hugging Face model repository; quant is optional, case-insensitive,
21+
default to Q4_K_M, or falls back to the first file in the repo if
22+
Q4_K_M doesn't exist.
23+
mmproj is also downloaded automatically if available. to disable, add
24+
--no-mmproj
25+
example: unsloth/phi-4-GGUF:q4_k_m
26+
(default: unused)
27+
-c, --ctx-size N Context size
28+
-n, --predict N Number of tokens to predict
29+
-t, --threads N Number of threads
2730
28-
Commands:
29-
model
30-
Model is a string with an optional prefix of
31-
huggingface:// (hf://), ollama://, https:// or file://.
32-
If no protocol is specified and a file exists in the specified
33-
path, file:// is assumed, otherwise if a file does not exist in
34-
the specified path, ollama:// is assumed. Models that are being
35-
pulled are downloaded with .partial extension while being
36-
downloaded and then renamed as the file without the .partial
37-
extension when complete.
38-
39-
Examples:
40-
llama-run llama3
41-
llama-run ollama://granite-code
42-
llama-run ollama://smollm:135m
43-
llama-run hf://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf
44-
llama-run huggingface://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf
45-
llama-run ms://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf
46-
llama-run modelscope://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf
47-
llama-run https://example.com/some-file1.gguf
48-
llama-run some-file2.gguf
49-
llama-run file://some-file3.gguf
50-
llama-run --ngl 999 some-file4.gguf
51-
llama-run --ngl 999 some-file5.gguf Hello World
31+
For all server options, run: llama-server --help
5232
```

0 commit comments

Comments
 (0)