[Bug]: prefix cache bug happens when use w4a16 for GLM5

### Your current environment

<details>
<summary>when using w4a16 for GLM5 Prefix cache hit rate: 0.0%</summary>

```text
(APIServer pid=1) INFO 03-09 02:05:27 [v1/metrics/loggers.py:259] Engine 000: Avg prompt throughput: 2356.1 tokens/s, Avg generation throughput: 11.5 tokens/s, Running: 8 reqs, Waiting: 0 reqs, GPU KV cache usage: 45.7%, Prefix cache hit rate: 0.0%
```

</details>


### 🐛 Describe the bug

(APIServer pid=1) INFO 03-09 02:05:27 [v1/metrics/loggers.py:259] Engine 000: Avg prompt throughput: 2356.1 tokens/s, Avg generation throughput: 11.5 tokens/s, Running: 8 reqs, Waiting: 0 reqs, GPU KV cache usage: 45.7%, Prefix cache hit rate: 0.0%

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: prefix cache bug happens when use w4a16 for GLM5 #36441

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: prefix cache bug happens when use w4a16 for GLM5 #36441

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions