[Bug]: Qwen3.5-35B-A3B-FP8 inference output terminates unexpectedly, logs show normal but request hangs

**Environment:**
- vLLM version: 0.17+ (CUDA 130)
- Model: Qwen/Qwen3.5-35B-A3B-FP8
- GPU: RTX 5090D × 2
- Open WebUI version: 0.8.10
- Launch command:
```bash
python3 -m vllm.entrypoints.openai.api_server \
 --model /home/ragnarokchan/models/Qwen3.5-35B-A3B-FP8 \
 --served-model-name Qwen3.5-35B-A3B-FP8 \
 --trust-remote-code \
 --gpu-memory-utilization 0.85 \
 --host 0.0.0.0 \
 --port 8000 \
 --tensor-parallel-size 2 \
 --enable-chunked-prefill \
 --max-num-seqs 16 \
 --max-model-len 65536 \
 --tool-call-parser qwen3_coder \
 --enable-auto-tool-choice \
 --calculate-kv-scales \
 --reasoning-parser qwen3
```

**Bug Description:**
When using Open WebUI to call vLLM for inference, the output suddenly terminates during generation. Logs show everything is normal, request status shows 200 OK, but the client hangs and cannot get the complete output.

The vLLM service itself does not crash. Re-sending the prompt (with priority) or opening a new chat can continue inference, but the same issue occurs again quickly.

**Steps to Reproduce:**
1. Start vLLM service (configuration as above)
2. Send a chat request via Open WebUI
3. Model starts generating output, but stops mid-way
4. Client cannot get complete response, request appears successful but content is truncated

**Logs:**
```
(APIServer pid=58580) INFO 03-11 11:13:18 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
(APIServer pid=58580) INFO 03-11 11:13:28 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 269.0 tokens/s, Running: 2 reqs, Waiting: 0 reqs, GPU KV cache usage: 2.0%, Prefix cache hit rate: 0.0%
(APIServer pid=58580) INFO: 192.168.100.152:56056 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=58580) INFO 03-11 11:13:38 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 182.8 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.0%, Prefix cache hit rate: 0.0%
(APIServer pid=58580) INFO 03-11 11:13:48 [loggers.py:259] Engine 000: Avg prompt throughput: 425.2 tokens/s, Avg generation throughput: 147.4 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.0%, Prefix cache hit rate: 0.0%
(APIServer pid=58580) INFO 03-11 11:13:58 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 148.2 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.1%, Prefix cache hit rate: 0.0%
(APIServer pid=58580) INFO 03-11 11:14:08 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 147.4 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.4%, Prefix cache hit rate: 0.0%
(APIServer pid=58580) INFO: 192.168.100.152:56267 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=58580) INFO 03-11 11:14:18 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 77.5 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
(APIServer pid=58580) INFO 03-11 11:14:28 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
```

**Question:**
Any suggestions for workarounds or fixes for this issue?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Qwen3.5-35B-A3B-FP8 inference output terminates unexpectedly, logs show normal but request hangs #36736

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Qwen3.5-35B-A3B-FP8 inference output terminates unexpectedly, logs show normal but request hangs #36736

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions