-
Notifications
You must be signed in to change notification settings - Fork 297
Description
Describe the issue
Built a docker container following these steps.
git clone https://github.com/vllm-project/vllm.git
cd vllm
docker build -f docker/Dockerfile.cpu --tag vllm-cpu-env --target vllm-openai .
Downloaded the [ggml-model-Q8_0.gguf] (https://huggingface.co/ond-ai/od-agent-1.4-Qwen3-8B-gguf/blob/main/ggml-model-Q8_0.gguf)
The below command does not load the model. What should be the correct way?
docker run -dit --name vllm_job --net=host -v /home/xyz/.cache/huggingface:/root/.cache/huggingface
--privileged=true --shm-size=16g
-e VLLM_CPU_KVCACHE_SPACE=64
-e VLLM_CPU_OMP_THREADS_BIND="0-29|32-61|64-93|96-125"
-e VLLM_CPU_SGL_KERNEL=1 vllm-cpu-env
--model=models/ggml-model-Q8_0.gguf
--tokenizer=ond-ai/od-agent-1.4-Qwen3-8B-gguf
--dtype=bfloat16 -tp=4 --enable-chunked-prefill --enable-prefix-caching