Skip to content

Conversation

yewentao256
Copy link
Collaborator

@yewentao256 yewentao256 commented Jul 21, 2025

Purpose

moe_align_block_size_triton was taken from SGL, used to benchmark the moe_align_block_size kernel

But now we already beats triton version in every shape, so we can safely remove this as no other references in the code as well.

moe-align-block-size-performance:
    num_tokens  num_experts  topk       VLLM      Triton
0          1.0         16.0   1.0  23.584001   52.127998
1          1.0         16.0   2.0  23.584001   50.655998
2          1.0         16.0   8.0  23.584001   51.936001
3          1.0         64.0   1.0  31.776000   52.096002
4          1.0         64.0   2.0  31.776000   51.376000
5          1.0         64.0   8.0  31.776000   50.880000
6          1.0        224.0   1.0  27.680000   64.544000
7          1.0        224.0   2.0  27.664000   64.544000
8          1.0        224.0   8.0  27.680000   64.544000
9          1.0        256.0   1.0  29.727999   66.592000
10         1.0        256.0   2.0  29.727999   66.592000
11         1.0        256.0   8.0  29.727999   66.592000
12         1.0        280.0   1.0  29.696001   68.672001
13         1.0        280.0   2.0  29.727999   68.672001
14         1.0        280.0   8.0  29.727999   68.640001
15         1.0        512.0   1.0  33.792000   97.344004
16         1.0        512.0   2.0  33.792000   97.344004
17         1.0        512.0   8.0  33.792000   97.344004
18        16.0         16.0   1.0  23.584001   52.448001
19        16.0         16.0   2.0  23.584001   51.199999
20        16.0         16.0   8.0  25.632000   52.207999
21        16.0         64.0   1.0  31.776000   52.064002
22        16.0         64.0   2.0  31.776000   52.032001
23        16.0         64.0   8.0  31.776000   51.456001
24        16.0        224.0   1.0  27.680000   61.632000
25        16.0        224.0   2.0  27.680000   61.439998
26        16.0        224.0   8.0  27.744001   61.407998
27        16.0        256.0   1.0  29.696001   66.592000
28        16.0        256.0   2.0  29.696001   66.592000
29        16.0        256.0   8.0  29.696001   66.624001
30        16.0        280.0   1.0  29.696001   70.720002
31        16.0        280.0   2.0  29.696001   70.688002
32        16.0        280.0   8.0  29.696001   71.584001
33        16.0        512.0   1.0  33.792000   97.375996
34        16.0        512.0   2.0  33.792000   97.375996
35        16.0        512.0   8.0  33.792000   95.296003
36       256.0         16.0   1.0  27.712001   52.832000
37       256.0         16.0   2.0  33.824001   54.912001
38       256.0         16.0   8.0  25.599999   64.479999
39       256.0         64.0   1.0  33.824001   52.800000
40       256.0         64.0   2.0  35.872001   52.320000
41       256.0         64.0   8.0  27.280000   56.352001
42       256.0        224.0   1.0  29.600000   62.527999
43       256.0        224.0   2.0  28.640000   62.527999
44       256.0        224.0   8.0  29.727999   74.784003
45       256.0        256.0   1.0  29.727999   68.640001
46       256.0        256.0   2.0  29.696001   64.576000
47       256.0        256.0   8.0  29.696001   70.720002
48       256.0        280.0   1.0  29.727999   68.736002
49       256.0        280.0   2.0  29.727999   70.752002
50       256.0        280.0   8.0  29.696001   74.752003
51       256.0        512.0   1.0  33.792000   93.216002
52       256.0        512.0   2.0  33.792000   92.032000
53       256.0        512.0   8.0  33.792000   91.168001
54      4096.0         16.0   1.0  27.648000   93.216002
55      4096.0         16.0   2.0  29.727999  136.224002
56      4096.0         16.0   8.0  46.080001  392.255992
57      4096.0         64.0   1.0  27.648000   64.544000
58      4096.0         64.0   2.0  31.744000   72.800003
59      4096.0         64.0   8.0  50.175998  140.351996
60      4096.0        224.0   1.0  29.727999   76.863997
61      4096.0        224.0   2.0  33.792000   80.959998
62      4096.0        224.0   8.0  52.352000  105.535999
63      4096.0        256.0   1.0  31.679999   78.879997
64      4096.0        256.0   2.0  35.808001   87.072000
65      4096.0        256.0   8.0  52.255999  109.792002
66      4096.0        280.0   1.0  31.744000   80.895998
67      4096.0        280.0   2.0  35.840001   88.992000
68      4096.0        280.0   8.0  50.175998  109.664001
69      4096.0        512.0   1.0  35.872001   95.200002
70      4096.0        512.0   2.0  37.888002  101.471998
71      4096.0        512.0   8.0  54.272000  128.064007

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the performance Performance-related issues label Jul 21, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request removes the Triton implementation of moe_align_block_size as the vLLM custom operation is now faster. The changes remove the Triton kernels and update the benchmark script. The code removal is clean and the rationale is supported by the provided performance data.

Comment on lines -90 to +36
line_vals=["vllm", "triton"], # "triton"
line_names=["VLLM", "Triton"], # "Triton"
line_vals=["vllm"],
line_names=["vLLM"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move the kernel into the benchmark script? It seems not that useful to have a benchmark as just one kernel

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I leave it here because I think someone else may reuse this script to compare. eg. new_vllm vs, old_vllm

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think moving the kernel into the benchmark doesn't help because we already win the triton kernel for every shape, so we don't need to maintain it. In the future when new kernel comes, just compare with the current version would be good, what's your thought?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC @mgoin

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that is fair enough. As long as we have a naive implementation to unit test against, it should be okay

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, someone else developed a torch version for unit test. But it is quite slow, so I don't choose to compare it for benchmark.

@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 25, 2025
@mgoin mgoin enabled auto-merge (squash) July 25, 2025 22:48
@vllm-bot vllm-bot merged commit 56e544f into vllm-project:main Jul 26, 2025
70 of 74 checks passed
liuyumoye pushed a commit to liuyumoye/vllm that referenced this pull request Jul 31, 2025
HsChen-sys pushed a commit to HsChen-sys/vllm that referenced this pull request Aug 1, 2025
wenscarl pushed a commit to wenscarl/vllm that referenced this pull request Aug 4, 2025
x22x22 pushed a commit to x22x22/vllm that referenced this pull request Aug 5, 2025
Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Aug 6, 2025
npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025
jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025
paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025
taneem-ibrahim pushed a commit to taneem-ibrahim/vllm that referenced this pull request Aug 14, 2025
BoyuanFeng pushed a commit to BoyuanFeng/vllm that referenced this pull request Aug 14, 2025
diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
googlercolin pushed a commit to googlercolin/vllm that referenced this pull request Aug 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance-related issues ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants