Skip to content

[WIP][V0.9.1] add support for flashcomm2 in qwen2 #1850

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: v0.9.1-dev
Choose a base branch
from

Conversation

David9857
Copy link
Contributor

@David9857 David9857 commented Jul 17, 2025

What this PR does / why we need it?

Same changes as #1726 but for qwen2.
Note: Enabling FlashComm in decoding stage may cause increased latency, so it is recommended to use disaggregated prefilling and enbale this feature in the prefill instance only!!!

Does this PR introduce any user-facing change?

How was this patch tested?

@David9857 David9857 changed the title [V0.9.1] add support for flashcomm2 in qwen2 [WIP][V0.9.1] add support for flashcomm2 in qwen2 Jul 18, 2025
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: David9857 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant