add eplb enabling kernels by mayuyuace · Pull Request #182 · vllm-project/vllm-xpu-kernels

mayuyuace · 2026-03-06T09:19:30Z

Add new sycl op init_expert_map and remap_hidden_states for eplb feature.
https://docs.vllm.ai/en/latest/serving/expert_parallel_deployment/?h=

vllm-xpu integration PR:https://github.com/intel-innersource/applications.ai.gpu.vllm-xpu/pull/157

Replace the op fused_moe_prologue which cannot support expert_map by the new ops.
And new ops have better performance:
Prefill stage:

Decode stage:

Signed-off-by: mayuyuace <qiming1.zhang@intel.com>

Copilot

Pull request overview

This PR adds EPLB (Expert Parallel Load Balancing) support by introducing two new SYCL kernels (init_expert_map and remap_hidden_states) that replace the existing fused_moe_prologue operation. The new ops support expert_map for expert parallel deployment and offer better performance in both prefill and decode stages.

Changes:

Added init_expert_map kernel to initialize expert-to-rank mapping and remap_hidden_states kernel to permute hidden states according to expert assignments.
Replaced fused_moe_prologue workspace-based approach in xpu_fused_moe with the new ops, adding expert_map parameter support.
Updated moe_gather to use the new row-major unpermuted_row_to_permuted_row layout [num_rows, TOPK] instead of [TOPK, num_rows].

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`vllm_xpu_kernels/fused_moe_interface.py`	Replaces `fused_moe_prologue` with `init_expert_map` and `remap_hidden_states` calls; adds `expert_map` parameter
`csrc/moe/init_expert_map.cpp`	New SYCL kernel to initialize expert map based on EP rank/size
`csrc/moe/remap_hidden_states.cpp`	New SYCL kernel to remap hidden states, compute expert offsets, and build permutation maps
`csrc/moe/moe_ops.h`	Declares the two new C++ functions
`csrc/moe/torch_bindings.cpp`	Registers the new ops with PyTorch
`csrc/moe/moe_gather.cpp`	Updates indexing to match new `[num_rows, TOPK]` layout of `unpermuted_row_to_permuted_row`

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

csrc/moe/remap_hidden_states.cpp

vllm_xpu_kernels/fused_moe_interface.py

csrc/moe/remap_hidden_states.cpp

Signed-off-by: mayuyuace <qiming1.zhang@intel.com>

Liangliang-Ma · 2026-03-09T08:07:26Z

pls also modify test_moe_prologue.py to validate offset/remap_input on all kinds of dtypes.

mayuyuace · 2026-03-10T07:06:05Z

@Liangliang-Ma
I don't change the moe_prologue kernel.

Liangliang-Ma · 2026-03-11T08:15:06Z

@Liangliang-Ma I don't change the moe_prologue kernel.

yeah I mean add your new kernel's tests. You can reuse/modify test_moe_prologue or add new files.

mayuyuace · 2026-03-11T08:20:06Z

@Liangliang-Ma I don't change the moe_prologue kernel.

yeah I mean add your new kernel's tests. You can reuse/modify test_moe_prologue or add new files.

Got it. I will add it later.

Signed-off-by: mayuyuace <qiming1.zhang@intel.com>

mayuyuace · 2026-03-12T06:29:26Z

@Liangliang-Ma
UT has been added. Please check.

mayuyuace added 4 commits March 5, 2026 07:24

add two ops

e76cde9

Signed-off-by: mayuyuace <qiming1.zhang@intel.com>

fix bug

0f96001

Signed-off-by: mayuyuace <qiming1.zhang@intel.com>

fix bug

5c8007b

Signed-off-by: mayuyuace <qiming1.zhang@intel.com>

optimize kernels

468f735

Signed-off-by: mayuyuace <qiming1.zhang@intel.com>

mayuyuace requested review from Liangliang-Ma, Copilot and jikunshang and removed request for Copilot and jikunshang March 6, 2026 09:19

format

0110dff

Signed-off-by: mayuyuace <qiming1.zhang@intel.com>

Copilot AI review requested due to automatic review settings March 6, 2026 09:23

Copilot AI reviewed Mar 6, 2026

View reviewed changes

Copilot started reviewing on behalf of mayuyuace March 6, 2026 09:32 View session

update moe_gather UT

601cd23

Signed-off-by: mayuyuace <qiming1.zhang@intel.com>

mayuyuace force-pushed the qiming/enale_eplb branch from 1bd971a to 601cd23 Compare March 9, 2026 01:58

Merge branch 'main' into qiming/enale_eplb

3b79000

mayuyuace added 3 commits March 11, 2026 18:15

Merge branch 'main' into qiming/enale_eplb

b630a31

remove useless param and add UT

5b604a1

Signed-off-by: mayuyuace <qiming1.zhang@intel.com>

reduce UT case number

c89217d

Signed-off-by: mayuyuace <qiming1.zhang@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add eplb enabling kernels#182

add eplb enabling kernels#182
mayuyuace wants to merge 10 commits intovllm-project:mainfrom
mayuyuace:qiming/enale_eplb

mayuyuace commented Mar 6, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Liangliang-Ma commented Mar 9, 2026

Uh oh!

mayuyuace commented Mar 10, 2026

Uh oh!

Liangliang-Ma commented Mar 11, 2026

Uh oh!

mayuyuace commented Mar 11, 2026

Uh oh!

mayuyuace commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mayuyuace commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Liangliang-Ma commented Mar 9, 2026

Uh oh!

mayuyuace commented Mar 10, 2026

Uh oh!

Liangliang-Ma commented Mar 11, 2026

Uh oh!

mayuyuace commented Mar 11, 2026

Uh oh!

mayuyuace commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mayuyuace commented Mar 6, 2026 •

edited

Loading