Skip to content

add eplb enabling kernels#182

Open
mayuyuace wants to merge 10 commits intovllm-project:mainfrom
mayuyuace:qiming/enale_eplb
Open

add eplb enabling kernels#182
mayuyuace wants to merge 10 commits intovllm-project:mainfrom
mayuyuace:qiming/enale_eplb

Conversation

@mayuyuace
Copy link
Collaborator

@mayuyuace mayuyuace commented Mar 6, 2026

Add new sycl op init_expert_map and remap_hidden_states for eplb feature.
https://docs.vllm.ai/en/latest/serving/expert_parallel_deployment/?h=

vllm-xpu integration PR:https://github.com/intel-innersource/applications.ai.gpu.vllm-xpu/pull/157

Replace the op fused_moe_prologue which cannot support expert_map by the new ops.
And new ops have better performance:
Prefill stage:
image
Decode stage:
image

Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
@mayuyuace mayuyuace requested review from Liangliang-Ma, Copilot and jikunshang and removed request for Copilot and jikunshang March 6, 2026 09:19
Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
Copilot AI review requested due to automatic review settings March 6, 2026 09:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds EPLB (Expert Parallel Load Balancing) support by introducing two new SYCL kernels (init_expert_map and remap_hidden_states) that replace the existing fused_moe_prologue operation. The new ops support expert_map for expert parallel deployment and offer better performance in both prefill and decode stages.

Changes:

  • Added init_expert_map kernel to initialize expert-to-rank mapping and remap_hidden_states kernel to permute hidden states according to expert assignments.
  • Replaced fused_moe_prologue workspace-based approach in xpu_fused_moe with the new ops, adding expert_map parameter support.
  • Updated moe_gather to use the new row-major unpermuted_row_to_permuted_row layout [num_rows, TOPK] instead of [TOPK, num_rows].

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
vllm_xpu_kernels/fused_moe_interface.py Replaces fused_moe_prologue with init_expert_map and remap_hidden_states calls; adds expert_map parameter
csrc/moe/init_expert_map.cpp New SYCL kernel to initialize expert map based on EP rank/size
csrc/moe/remap_hidden_states.cpp New SYCL kernel to remap hidden states, compute expert offsets, and build permutation maps
csrc/moe/moe_ops.h Declares the two new C++ functions
csrc/moe/torch_bindings.cpp Registers the new ops with PyTorch
csrc/moe/moe_gather.cpp Updates indexing to match new [num_rows, TOPK] layout of unpermuted_row_to_permuted_row

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
@mayuyuace mayuyuace force-pushed the qiming/enale_eplb branch from 1bd971a to 601cd23 Compare March 9, 2026 01:58
@Liangliang-Ma
Copy link
Collaborator

pls also modify test_moe_prologue.py to validate offset/remap_input on all kinds of dtypes.

@mayuyuace
Copy link
Collaborator Author

@Liangliang-Ma
I don't change the moe_prologue kernel.

@Liangliang-Ma
Copy link
Collaborator

@Liangliang-Ma I don't change the moe_prologue kernel.

yeah I mean add your new kernel's tests. You can reuse/modify test_moe_prologue or add new files.

@mayuyuace
Copy link
Collaborator Author

@Liangliang-Ma I don't change the moe_prologue kernel.

yeah I mean add your new kernel's tests. You can reuse/modify test_moe_prologue or add new files.

Got it. I will add it later.

Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
@mayuyuace
Copy link
Collaborator Author

@Liangliang-Ma
UT has been added. Please check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants