[fix] prefill unsupport sliding window attention #2758

NSDie · 2025-09-04T11:58:04Z

What this PR does / why we need it?

fix prefill attention bug，not support sliding window. npu_fused_infer_attention_score head_dim only equal 128, not support other number.

Does this PR introduce any user-facing change?

remove prefill phase npu_fused_infer_attention_score

How was this patch tested?

vLLM version: v0.10.1.1
vLLM main: vllm-project/vllm@e599e2c

Signed-off-by: nsdie <[email protected]>

gemini-code-assist

Code Review

This pull request addresses a bug in the prefill phase for sliding window attention by removing the specialized code path that used npu_fused_infer_attention_score. While this correctly resolves the main issue, a related change to the _repeat_kv helper function introduces a subtle bug where it implements torch.repeat instead of the documented torch.repeat_interleave. I've provided a critical comment with a suggested fix for this function, and also noted that it appears to be unused after these changes and could potentially be removed.

gemini-code-assist · 2025-09-04T11:59:54Z

vllm_ascend/attention/attention_v1.py

        hidden_states = hidden_states[:, None, :, :].expand(
-            num_key_value_heads, n_rep, slen, head_dim)
-        return hidden_states.reshape(num_key_value_heads * n_rep, slen,
+            slen, n_rep, num_key_value_heads, head_dim)
+        return hidden_states.reshape(slen, num_key_value_heads * n_rep,
                                     head_dim)


The implementation of _repeat_kv does not match its docstring, which states it should be equivalent to torch.repeat_interleave. The current implementation performs a torch.repeat operation, not torch.repeat_interleave. This can lead to incorrect attention calculations in Grouped-Query Attention (GQA) scenarios where key and value states are expanded.

For a tensor with shape (slen, num_kv_heads, head_dim), repeat_interleave on dim=1 should result in each head being repeated n_rep times consecutively. The current implementation repeats the whole sequence of heads n_rep times.

Additionally, after this pull request's changes, this function appears to be unused and could potentially be removed.

Suggested change

hidden_states = hidden_states[:, None, :, :].expand(

num_key_value_heads, n_rep, slen, head_dim)

return hidden_states.reshape(num_key_value_heads * n_rep, slen,

slen, n_rep, num_key_value_heads, head_dim)

return hidden_states.reshape(slen, num_key_value_heads * n_rep,

head_dim)

hidden_states = hidden_states.unsqueeze(2).expand(

slen, num_key_value_heads, n_rep, head_dim)

return hidden_states.reshape(slen, num_key_value_heads * n_rep,

head_dim)

Signed-off-by: nsdie <[email protected]>

github-actions · 2025-09-04T12:12:16Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: nsdie <[email protected]>

wangxiyuan · 2025-09-05T01:50:35Z

vllm_ascend/attention/attention_v1.py

-            key = self._repeat_kv(key, self.num_heads // self.num_kv_heads)
-            value = self._repeat_kv(value, self.num_heads // self.num_kv_heads)
-
-            output, _ = torch_npu.npu_fused_infer_attention_score(


so it's confirmed that this op doesn't work for prefill? can you paste any link or explain more for it?

The npu_fused_infer_attention_score operator does not support the prefill phase

Signed-off-by: nsdie <[email protected]>

codecov · 2025-09-05T15:20:00Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.90%. Comparing base (f86596a) to head (685bfd9).
⚠️ Report is 9 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2758      +/-   ##
==========================================
- Coverage   72.99%   72.90%   -0.09%     
==========================================
  Files         153      153              
  Lines       21338    21368      +30     
==========================================
+ Hits        15575    15579       +4     
- Misses       5763     5789      +26

Flag	Coverage Δ
unittests	`72.90% <100.00%> (-0.09%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

zzzzwwjj · 2025-09-06T09:08:44Z

This is a fallback PR for #2528
Please explain that why we need to revert it in pr msg.

### What this PR does / why we need it? fix prefill attention bug，not support sliding window. npu_fused_infer_attention_score head_dim only equal 128, not support other number. ### Does this PR introduce _any_ user-facing change? remove prefill phase npu_fused_infer_attention_score ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@e599e2c --------- Signed-off-by: nsdie <[email protected]>

### What this PR does / why we need it? fix prefill attention bug，not support sliding window. npu_fused_infer_attention_score head_dim only equal 128, not support other number. ### Does this PR introduce _any_ user-facing change? remove prefill phase npu_fused_infer_attention_score ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@e599e2c --------- Signed-off-by: nsdie <[email protected]> Signed-off-by: offline0806 <[email protected]>

NSDie added 2 commits September 4, 2025 19:58

fix bug prefill not support swa

054cfd4

Signed-off-by: nsdie <[email protected]>

fix bug prefill not support swa

309368e

Signed-off-by: nsdie <[email protected]>

gemini-code-assist bot reviewed Sep 4, 2025

View reviewed changes

fix

0878bc4

Signed-off-by: nsdie <[email protected]>

NSDie force-pushed the main branch from b55dee4 to 0878bc4 Compare September 4, 2025 12:04

cleancode

af248e4

Signed-off-by: nsdie <[email protected]>

wangxiyuan reviewed Sep 5, 2025

View reviewed changes

rm ut

685bfd9

Signed-off-by: nsdie <[email protected]>

github-actions bot added the module:tests label Sep 5, 2025

NSDie changed the title ~~【fix】prefill unsupport sliding window attention~~ [fix]prefill unsupport sliding window attention Sep 6, 2025

NSDie changed the title ~~[fix]prefill unsupport sliding window attention~~ [fix] prefill unsupport sliding window attention Sep 6, 2025

zzzzwwjj approved these changes Sep 6, 2025

View reviewed changes

wangxiyuan approved these changes Sep 7, 2025

View reviewed changes

wangxiyuan merged commit b2f77d3 into vllm-project:main Sep 7, 2025
38 of 42 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[fix] prefill unsupport sliding window attention #2758

[fix] prefill unsupport sliding window attention #2758

Uh oh!

NSDie commented Sep 4, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 4, 2025

Uh oh!

github-actions bot commented Sep 4, 2025

Uh oh!

wangxiyuan Sep 5, 2025

Uh oh!

NSDie Sep 5, 2025

Uh oh!

codecov bot commented Sep 5, 2025 •

edited

Loading

Uh oh!

zzzzwwjj commented Sep 6, 2025

Uh oh!

Uh oh!

Uh oh!

[fix] prefill unsupport sliding window attention #2758

[fix] prefill unsupport sliding window attention #2758

Uh oh!

Conversation

NSDie commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 4, 2025

Uh oh!

wangxiyuan Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

NSDie Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

zzzzwwjj commented Sep 6, 2025

Uh oh!

Uh oh!

Uh oh!

NSDie commented Sep 4, 2025 •

edited

Loading

codecov bot commented Sep 5, 2025 •

edited

Loading