[Refactor] Remove `moe_align_block_size_triton` #21335

yewentao256 · 2025-07-21T21:36:53Z

Purpose

moe_align_block_size_triton was taken from SGL, used to benchmark the moe_align_block_size kernel

But now we already beats triton version in every shape, so we can safely remove this as no other references in the code as well.

moe-align-block-size-performance:
    num_tokens  num_experts  topk       VLLM      Triton
0          1.0         16.0   1.0  23.584001   52.127998
1          1.0         16.0   2.0  23.584001   50.655998
2          1.0         16.0   8.0  23.584001   51.936001
3          1.0         64.0   1.0  31.776000   52.096002
4          1.0         64.0   2.0  31.776000   51.376000
5          1.0         64.0   8.0  31.776000   50.880000
6          1.0        224.0   1.0  27.680000   64.544000
7          1.0        224.0   2.0  27.664000   64.544000
8          1.0        224.0   8.0  27.680000   64.544000
9          1.0        256.0   1.0  29.727999   66.592000
10         1.0        256.0   2.0  29.727999   66.592000
11         1.0        256.0   8.0  29.727999   66.592000
12         1.0        280.0   1.0  29.696001   68.672001
13         1.0        280.0   2.0  29.727999   68.672001
14         1.0        280.0   8.0  29.727999   68.640001
15         1.0        512.0   1.0  33.792000   97.344004
16         1.0        512.0   2.0  33.792000   97.344004
17         1.0        512.0   8.0  33.792000   97.344004
18        16.0         16.0   1.0  23.584001   52.448001
19        16.0         16.0   2.0  23.584001   51.199999
20        16.0         16.0   8.0  25.632000   52.207999
21        16.0         64.0   1.0  31.776000   52.064002
22        16.0         64.0   2.0  31.776000   52.032001
23        16.0         64.0   8.0  31.776000   51.456001
24        16.0        224.0   1.0  27.680000   61.632000
25        16.0        224.0   2.0  27.680000   61.439998
26        16.0        224.0   8.0  27.744001   61.407998
27        16.0        256.0   1.0  29.696001   66.592000
28        16.0        256.0   2.0  29.696001   66.592000
29        16.0        256.0   8.0  29.696001   66.624001
30        16.0        280.0   1.0  29.696001   70.720002
31        16.0        280.0   2.0  29.696001   70.688002
32        16.0        280.0   8.0  29.696001   71.584001
33        16.0        512.0   1.0  33.792000   97.375996
34        16.0        512.0   2.0  33.792000   97.375996
35        16.0        512.0   8.0  33.792000   95.296003
36       256.0         16.0   1.0  27.712001   52.832000
37       256.0         16.0   2.0  33.824001   54.912001
38       256.0         16.0   8.0  25.599999   64.479999
39       256.0         64.0   1.0  33.824001   52.800000
40       256.0         64.0   2.0  35.872001   52.320000
41       256.0         64.0   8.0  27.280000   56.352001
42       256.0        224.0   1.0  29.600000   62.527999
43       256.0        224.0   2.0  28.640000   62.527999
44       256.0        224.0   8.0  29.727999   74.784003
45       256.0        256.0   1.0  29.727999   68.640001
46       256.0        256.0   2.0  29.696001   64.576000
47       256.0        256.0   8.0  29.696001   70.720002
48       256.0        280.0   1.0  29.727999   68.736002
49       256.0        280.0   2.0  29.727999   70.752002
50       256.0        280.0   8.0  29.696001   74.752003
51       256.0        512.0   1.0  33.792000   93.216002
52       256.0        512.0   2.0  33.792000   92.032000
53       256.0        512.0   8.0  33.792000   91.168001
54      4096.0         16.0   1.0  27.648000   93.216002
55      4096.0         16.0   2.0  29.727999  136.224002
56      4096.0         16.0   8.0  46.080001  392.255992
57      4096.0         64.0   1.0  27.648000   64.544000
58      4096.0         64.0   2.0  31.744000   72.800003
59      4096.0         64.0   8.0  50.175998  140.351996
60      4096.0        224.0   1.0  29.727999   76.863997
61      4096.0        224.0   2.0  33.792000   80.959998
62      4096.0        224.0   8.0  52.352000  105.535999
63      4096.0        256.0   1.0  31.679999   78.879997
64      4096.0        256.0   2.0  35.808001   87.072000
65      4096.0        256.0   8.0  52.255999  109.792002
66      4096.0        280.0   1.0  31.744000   80.895998
67      4096.0        280.0   2.0  35.840001   88.992000
68      4096.0        280.0   8.0  50.175998  109.664001
69      4096.0        512.0   1.0  35.872001   95.200002
70      4096.0        512.0   2.0  37.888002  101.471998
71      4096.0        512.0   8.0  54.272000  128.064007

Signed-off-by: yewentao256 <[email protected]>

github-actions · 2025-07-21T21:37:03Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request removes the Triton implementation of moe_align_block_size as the vLLM custom operation is now faster. The changes remove the Triton kernels and update the benchmark script. The code removal is clean and the rationale is supported by the provided performance data.

benchmarks/kernels/benchmark_moe_align_block_size.py

mgoin · 2025-07-21T23:51:37Z

benchmarks/kernels/benchmark_moe_align_block_size.py

-        line_vals=["vllm", "triton"],  # "triton"
-        line_names=["VLLM", "Triton"],  # "Triton"
+        line_vals=["vllm"],
+        line_names=["vLLM"],


Should we move the kernel into the benchmark script? It seems not that useful to have a benchmark as just one kernel

I leave it here because I think someone else may reuse this script to compare. eg. new_vllm vs, old_vllm

I think moving the kernel into the benchmark doesn't help because we already win the triton kernel for every shape, so we don't need to maintain it. In the future when new kernel comes, just compare with the current version would be good, what's your thought?

Okay, that is fair enough. As long as we have a naive implementation to unit test against, it should be okay

Yes, someone else developed a torch version for unit test. But it is quite slow, so I don't choose to compare it for benchmark.

Signed-off-by: yewentao256 <[email protected]>

Signed-off-by: yewentao256 <[email protected]> Signed-off-by: x22x22 <[email protected]>

Signed-off-by: yewentao256 <[email protected]>

Signed-off-by: yewentao256 <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>

Signed-off-by: yewentao256 <[email protected]> Signed-off-by: Paul Pak <[email protected]>

Signed-off-by: yewentao256 <[email protected]> Signed-off-by: Diego-Castan <[email protected]>

Signed-off-by: yewentao256 <[email protected]>

remove triton moe align block size

d766f7b

Signed-off-by: yewentao256 <[email protected]>

mergify bot added the performance Performance-related issues label Jul 21, 2025

gemini-code-assist bot reviewed Jul 21, 2025

View reviewed changes

benchmarks/kernels/benchmark_moe_align_block_size.py Show resolved Hide resolved

mgoin reviewed Jul 21, 2025

View reviewed changes

mgoin approved these changes Jul 25, 2025

View reviewed changes

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 25, 2025

mgoin enabled auto-merge (squash) July 25, 2025 22:48

vllm-bot merged commit 56e544f into vllm-project:main Jul 26, 2025
70 of 74 checks passed

liuyumoye pushed a commit to liuyumoye/vllm that referenced this pull request Jul 31, 2025

[Refactor] Remove moe_align_block_size_triton (vllm-project#21335)

c00827c

Signed-off-by: yewentao256 <[email protected]>

HsChen-sys pushed a commit to HsChen-sys/vllm that referenced this pull request Aug 1, 2025

[Refactor] Remove moe_align_block_size_triton (vllm-project#21335)

37b63e2

Signed-off-by: yewentao256 <[email protected]>

x22x22 pushed a commit to x22x22/vllm that referenced this pull request Aug 5, 2025

[Refactor] Remove moe_align_block_size_triton (vllm-project#21335)

3613fa7

Signed-off-by: yewentao256 <[email protected]> Signed-off-by: x22x22 <[email protected]>

Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Aug 6, 2025

[Refactor] Remove moe_align_block_size_triton (vllm-project#21335)

c53deee

Signed-off-by: yewentao256 <[email protected]>

npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025

[Refactor] Remove moe_align_block_size_triton (vllm-project#21335)

e733403

Signed-off-by: yewentao256 <[email protected]>

jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025

[Refactor] Remove moe_align_block_size_triton (vllm-project#21335)

59705fe

Signed-off-by: yewentao256 <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>

paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025

[Refactor] Remove moe_align_block_size_triton (vllm-project#21335)

71606f7

Signed-off-by: yewentao256 <[email protected]> Signed-off-by: Paul Pak <[email protected]>

diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025

[Refactor] Remove moe_align_block_size_triton (vllm-project#21335)

e40b232

Signed-off-by: yewentao256 <[email protected]> Signed-off-by: Diego-Castan <[email protected]>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

[Refactor] Remove moe_align_block_size_triton (vllm-project#21335)

fa3d7a8

Signed-off-by: yewentao256 <[email protected]>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

[Refactor] Remove moe_align_block_size_triton (vllm-project#21335)

67ef372

Signed-off-by: yewentao256 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Refactor] Remove `moe_align_block_size_triton` #21335

[Refactor] Remove `moe_align_block_size_triton` #21335

Uh oh!

yewentao256 commented Jul 21, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jul 21, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

mgoin Jul 21, 2025

Uh oh!

yewentao256 Jul 22, 2025

Uh oh!

yewentao256 Jul 22, 2025

Uh oh!

yewentao256 Jul 25, 2025

Uh oh!

mgoin Jul 25, 2025

Uh oh!

yewentao256 Jul 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Refactor] Remove moe_align_block_size_triton #21335

[Refactor] Remove moe_align_block_size_triton #21335

Uh oh!

Conversation

yewentao256 commented Jul 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Uh oh!

github-actions bot commented Jul 21, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

mgoin Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

yewentao256 Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

yewentao256 Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

yewentao256 Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

mgoin Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

yewentao256 Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Refactor] Remove `moe_align_block_size_triton` #21335

[Refactor] Remove `moe_align_block_size_triton` #21335

yewentao256 commented Jul 21, 2025 •

edited by github-actions bot

Loading