Running GGUF models in vLLM CPU (GNR Machine)

### Describe the issue

Built a docker container following these steps.

git clone https://github.com/vllm-project/vllm.git
cd vllm
docker build -f docker/Dockerfile.cpu --tag vllm-cpu-env --target vllm-openai .

Downloaded the [ggml-model-Q8_0.gguf] (https://huggingface.co/ond-ai/od-agent-1.4-Qwen3-8B-gguf/blob/main/ggml-model-Q8_0.gguf)

The below command does not load the model. What should be the correct way?

docker run -dit --name vllm_job --net=host -v /home/xyz/.cache/huggingface:/root/.cache/huggingface \
--privileged=true --shm-size=16g \
-e VLLM_CPU_KVCACHE_SPACE=64 \
-e VLLM_CPU_OMP_THREADS_BIND="0-29|32-61|64-93|96-125" \
-e VLLM_CPU_SGL_KERNEL=1 vllm-cpu-env \
--model=models/ggml-model-Q8_0.gguf \
--tokenizer=ond-ai/od-agent-1.4-Qwen3-8B-gguf \
--dtype=bfloat16 -tp=4 --enable-chunked-prefill --enable-prefix-caching


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Running GGUF models in vLLM CPU (GNR Machine) #861

Describe the issue

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Running GGUF models in vLLM CPU (GNR Machine) #861

Description

Describe the issue

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions