[main] Fuse GroupedMatmul, Swiglu and DynamicQuant in `W8A8_DYNAMIC` quantized MoE layers #2275

zhoux77899 · 2025-08-08T03:53:14Z

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Tested on W8A8 quantized Qwen3-235B-A22B model with bs=16

tp=8, dp=1, moe_tp=8, moe_ep=1, TPOP increased 21.54%, Output Token Throughput increased 27.35%

tp=8, dp=1, moe_tp=1, moe_ep=8, TPOP increased 17.38%, Output Token Throughput increased 6.86%

vLLM version: v0.10.0
vLLM main: vllm-project/vllm@3821787

…C` quantized MoE layers Signed-off-by: zhoux77899 <[email protected]>

github-actions · 2025-08-08T04:12:13Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: zhoux77899 <[email protected]>

codecov · 2025-08-08T06:45:31Z

Codecov Report

❌ Patch coverage is 83.73494% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.92%. Comparing base (1de16ea) to head (90ea998).

Files with missing lines	Patch %	Lines
vllm_ascend/quantization/w8a8_dynamic.py	43.33%	17 Missing ⚠️
vllm_ascend/quantization/w4a8_dynamic.py	70.58%	10 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2275      +/-   ##
==========================================
+ Coverage   77.37%   77.92%   +0.54%     
==========================================
  Files         128      128              
  Lines       16455    16608     +153     
==========================================
+ Hits        12732    12941     +209     
+ Misses       3723     3667      -56

Flag	Coverage Δ
unittests	`77.92% <83.73%> (+0.54%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: zhoux77899 <[email protected]>

github-actions · 2025-08-11T03:34:51Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

vllm_ascend/quantization/w8a8_dynamic.py

Signed-off-by: zhoux77899 <[email protected]>

…llm-ascend into main_gmmswigluquant

Signed-off-by: zhoux77899 <[email protected]>

realliujiaxu · 2025-08-12T03:02:09Z

vllm_ascend/quantization/w8a8_dynamic.py

        x=hidden_states,
+        weight=w1,
+        group_list=group_list if group_list_type == 0 else group_list.cumsum(


need to modify fused_experts_with_mc2(): pass expert_token_nums_type=1 to npu_moe_distribute_dispatch() and pass group_list_type = 0 to apply_mlp_decode()

Signed-off-by: zhoux77899 <[email protected]>

github-actions · 2025-08-20T12:27:30Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: zhoux77899 <[email protected]>

feat(performance): support GroupedMatmulSwigluQuant in `W8A8_DYNAMI…

a5aefdf

…C` quantized MoE layers Signed-off-by: zhoux77899 <[email protected]>

github-actions bot added module:tests module:core module:quantization labels Aug 8, 2025

fix(lint): fix lint

0f688cd

Signed-off-by: zhoux77899 <[email protected]>

zhoux77899 added 2 commits August 8, 2025 15:17

fix(bug): fix bug

840c03f

Signed-off-by: zhoux77899 <[email protected]>

feat(ops): enable grouped_matmul_swiglu_quant by default

cdf5e1e

Signed-off-by: zhoux77899 <[email protected]>

github-actions bot removed module:tests module:core labels Aug 8, 2025

zhoux77899 added 2 commits August 8, 2025 17:07

fix(lint): fix lint

c3c0913

Signed-off-by: zhoux77899 <[email protected]>

fix(test): fix broken test

f05687f

Signed-off-by: zhoux77899 <[email protected]>

github-actions bot added the module:tests label Aug 8, 2025

zhoux77899 added 3 commits August 8, 2025 23:07

fix(lint): fix lint

4f3afe6

Signed-off-by: zhoux77899 <[email protected]>

fix(test): temporally skip broken test due to oom

3b32dc8

Signed-off-by: zhoux77899 <[email protected]>

fix(test): change bias1 to tensor

a3c9b44

Signed-off-by: zhoux77899 <[email protected]>

github-actions bot added the merge-conflicts label Aug 11, 2025

Merge branch 'main' into main_gmmswigluquant

67e9872

github-actions bot removed the merge-conflicts label Aug 11, 2025

ApsarasX reviewed Aug 11, 2025

View reviewed changes

vllm_ascend/quantization/w8a8_dynamic.py Outdated Show resolved Hide resolved

vllm_ascend/quantization/w8a8_dynamic.py Outdated Show resolved Hide resolved

zhoux77899 added 6 commits August 11, 2025 21:12

fix(bug): update group_list handling and weight scale in dynamic methods

68e31db

Signed-off-by: zhoux77899 <[email protected]>

fix(lint): fix lint

a3715ec

Signed-off-by: zhoux77899 <[email protected]>

fix(lint): fix lint

58d6371

Signed-off-by: zhoux77899 <[email protected]>

feat(ops): replace all splited gmm and swiglu

a46315d

Signed-off-by: zhoux77899 <[email protected]>

Merge branch 'main_gmmswigluquant' of https://github.com/zhoux77899/v…

5ee5a83

…llm-ascend into main_gmmswigluquant

fix(lint): fix lint

0ea5246

Signed-off-by: zhoux77899 <[email protected]>

realliujiaxu reviewed Aug 12, 2025

View reviewed changes

feat(quantization): split w4a8 and w8a8 apply

d9b16fc

Signed-off-by: zhoux77899 <[email protected]>

Merge branch 'main' into main_gmmswigluquant

04523e6

github-actions bot removed the merge-conflicts label Aug 14, 2025

zhoux77899 added 2 commits August 14, 2025 11:23

fix(dtype): unify w1_scale dtype

3d2b849

Signed-off-by: zhoux77899 <[email protected]>

fix(lint): fix lint

51ec3d8

Signed-off-by: zhoux77899 <[email protected]>

github-actions bot added module:ops module:core labels Aug 14, 2025

fix(bias): unify bias1

4705afb

Signed-off-by: zhoux77899 <[email protected]>

zhoux77899 force-pushed the main_gmmswigluquant branch from a4bb618 to 4705afb Compare August 14, 2025 06:58

github-actions bot removed module:ops module:core labels Aug 14, 2025

zhoux77899 and others added 8 commits August 14, 2025 19:39

test(ut): add w8a8_dynamic ut

bd74a40

Signed-off-by: zhoux77899 <[email protected]>

fix(lint): fix lint

b627283

Signed-off-by: zhoux77899 <[email protected]>

fix(ut): fix broken ut

cddcf20

Signed-off-by: zhoux77899 <[email protected]>

fix(lint): fix lint

402760a

Signed-off-by: zhoux77899 <[email protected]>

Merge branch 'vllm-project:main' into main_gmmswigluquant

cda8d3b

test(ci): add w4a8_dynamic ut

6a5fb6f

Signed-off-by: zhoux77899 <[email protected]>

fix(test): fix broken ut

b341141

Signed-off-by: zhoux77899 <[email protected]>

Merge branch 'vllm-project:main' into main_gmmswigluquant

064217e

zhoux77899 changed the title ~~[main] Support GroupedMatmulSwigluQuant in W8A8_DYNAMIC quantized MoE layers~~ [main] Fuse GroupedMatmul, Swiglu and DynamicQuant in W8A8_DYNAMIC quantized MoE layers Aug 16, 2025

zhoux77899 added 5 commits August 19, 2025 17:40

Merge branch 'vllm-project:main' into main_gmmswigluquant

826d713

Merge branch 'vllm-project:main' into main_gmmswigluquant

cbbaae4

Merge branch 'vllm-project:main' into main_gmmswigluquant

e648aa1

Merge branch 'vllm-project:main' into main_gmmswigluquant

eb6b0d5

Merge branch 'vllm-project:main' into main_gmmswigluquant

fd675eb

github-actions bot added the merge-conflicts label Aug 20, 2025

Merge branch 'main' into main_gmmswigluquant

b0585aa

github-actions bot removed the merge-conflicts label Aug 20, 2025

zhoux77899 added 2 commits August 20, 2025 21:53

fix(test): fix broken ut

e2d7cc3

Signed-off-by: zhoux77899 <[email protected]>

fix(lint): fix lint

90ea998

Signed-off-by: zhoux77899 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[main] Fuse GroupedMatmul, Swiglu and DynamicQuant in `W8A8_DYNAMIC` quantized MoE layers #2275

[main] Fuse GroupedMatmul, Swiglu and DynamicQuant in `W8A8_DYNAMIC` quantized MoE layers #2275

zhoux77899 commented Aug 8, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Aug 8, 2025

Uh oh!

codecov bot commented Aug 8, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 11, 2025

Uh oh!

Uh oh!

Uh oh!

realliujiaxu Aug 12, 2025

Uh oh!

github-actions bot commented Aug 20, 2025

Uh oh!

Uh oh!

[main] Fuse GroupedMatmul, Swiglu and DynamicQuant in W8A8_DYNAMIC quantized MoE layers #2275

Are you sure you want to change the base?

[main] Fuse GroupedMatmul, Swiglu and DynamicQuant in W8A8_DYNAMIC quantized MoE layers #2275

Conversation

zhoux77899 commented Aug 8, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Aug 8, 2025

Uh oh!

codecov bot commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Aug 11, 2025

Uh oh!

Uh oh!

Uh oh!

realliujiaxu Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 20, 2025

Uh oh!

Uh oh!

[main] Fuse GroupedMatmul, Swiglu and DynamicQuant in `W8A8_DYNAMIC` quantized MoE layers #2275

[main] Fuse GroupedMatmul, Swiglu and DynamicQuant in `W8A8_DYNAMIC` quantized MoE layers #2275

zhoux77899 commented Aug 8, 2025 •

edited by github-actions bot

Loading

codecov bot commented Aug 8, 2025 •

edited

Loading