Hi, and thank you for this integration of vLLM into RunPod. The fast autoscaling of your platform is super impressive.
I wanted to run Qwen 3.5 with this project, as people say it is better than GPT-OSS for agentic coding.
I tried creating and endpoint using worker-vllm and model unsloth/Qwen3.5-35B-A3B-GGUF.
This failed with an error on vLLM startup like:
{"requestId": null, "message": "Worker startup failed: 1 validation error for ModelConfig\n Value error, Model architectures ['Qwen3_5MoeForConditionalGeneration'] are not supported for now.
This seems to be because v 2.14.0 of this project looks to be shipping vLLM 0.16.0,
Whereas full Qwen 3.5 support is available in vLMM release v0.17.0.
Hi, and thank you for this integration of vLLM into RunPod. The fast autoscaling of your platform is super impressive.
I wanted to run Qwen 3.5 with this project, as people say it is better than GPT-OSS for agentic coding.
I tried creating and endpoint using worker-vllm and model
unsloth/Qwen3.5-35B-A3B-GGUF.This failed with an error on vLLM startup like:
This seems to be because v 2.14.0 of this project looks to be shipping vLLM 0.16.0,
Whereas full Qwen 3.5 support is available in vLMM release v0.17.0.