Skip to content

[WIP] support sliding window of decoding flash attn#161

Open
mayuyuace wants to merge 1 commit intovllm-project:mainfrom
mayuyuace:qiming/flash_attn_window
Open

[WIP] support sliding window of decoding flash attn#161
mayuyuace wants to merge 1 commit intovllm-project:mainfrom
mayuyuace:qiming/flash_attn_window

Conversation

@mayuyuace
Copy link
Collaborator

@mayuyuace mayuyuace commented Feb 27, 2026

Enable sliding window of decoding flash attention, and add UT for this param.

Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
@xinyu-intel
Copy link
Collaborator

@mayuyuace
Copy link
Collaborator Author

Copy link
Collaborator

@baodii baodii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@baodii baodii self-requested a review March 2, 2026 01:45
@baodii
Copy link
Collaborator

baodii commented Mar 2, 2026

Hi @mayuyuace , please delete the is_local judgement in https://github.com/vllm-project/vllm-xpu-kernels/blob/main/csrc/flash_attn/flash_api.cpp#L108 to make the kernel run into decode kernel.

@mayuyuace mayuyuace changed the title support sliding window of decoding flash attn [WIP] support sliding window of decoding flash attn Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants