remove xpu_fused_moe weights handling#163
remove xpu_fused_moe weights handling#163mayuyuace wants to merge 1 commit intovllm-project:mainfrom
Conversation
Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
There was a problem hiding this comment.
Pull request overview
This PR removes implicit weight-layout mutation/caching from xpu_fused_moe and updates tests to prepare weights in the expected layout externally, while also loosening activation-name matching for the SWIGLUOAI path.
Changes:
- Removed in-function transpose/caching for non-int4/non-mxfp4 weights and adjusted
inter_sizederivation based on expected layout. - Updated fused MoE tests to transpose
w13/w2before callingxpu_fused_moe. - Expanded SWIGLUOAI activation matching logic.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
vllm_xpu_kernels/fused_moe_interface.py |
Stops mutating weights for layout, changes inter_size inference, and adjusts SWIGLUOAI activation detection. |
tests/fused_moe/test_fused_moe.py |
Pre-transposes weights in tests to match the new expected layout. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| inter_size = list(w13.shape)[-1] // 2 | ||
| else: | ||
| inter_size = list(w13.shape)[-2] // 2 | ||
|
|
||
| assert w13.is_contiguous() and w2.is_contiguous() |
There was a problem hiding this comment.
xpu_fused_moe now derives inter_size from w13.shape[-1] for non-int4/non-mxfp4, which implicitly requires w13/w2 to already be in the [E, K, N] layout (transposed vs the shapes documented above and previously handled in this function). To avoid silent wrong results/crashes when callers pass the old layout, please (1) update the parameter shape docs to match the new expectation and (2) add explicit shape/layout validation (e.g., check the dimension that should equal hidden_size, and that w13.shape[-1] is divisible by 2) before calling the GEMM op.
| elif activation == "gelu": | ||
| torch.ops._C.gelu_and_mul(act_output, gemm1_output) | ||
| elif activation == "swigluoai": | ||
| elif activation == "swigluoai" or ("SWIGLUOAI" in str(activation)): |
There was a problem hiding this comment.
The condition ("SWIGLUOAI" in str(activation)) is overly broad: it will also match unrelated values like "NOT_SWIGLUOAI" and route them to the swiglu kernel. Safer is to normalize once (e.g., str(activation).lower()) and compare against an explicit allowlist of supported names (including any enum/string variants you intend to support).
| w13.data = w13.transpose(-1, -2).contiguous() | ||
| w2.data = w2.transpose(-1, -2).contiguous() |
There was a problem hiding this comment.
Avoid assigning through .data here. It bypasses autograd and can lead to hard-to-debug issues if this test is adapted/reused; since these are local variables, prefer rebinding w13/w2 to the transposed contiguous tensors (or otherwise copying safely) instead of mutating .data.
| w13.data = w13.transpose(-1, -2).contiguous() | |
| w2.data = w2.transpose(-1, -2).contiguous() | |
| w13 = w13.transpose(-1, -2).contiguous() | |
| w2 = w2.transpose(-1, -2).contiguous() |
| w13.data = w13.transpose(-1, -2).contiguous() | ||
| w2.data = w2.transpose(-1, -2).contiguous() |
There was a problem hiding this comment.
Avoid assigning through .data here. It bypasses autograd and can lead to hard-to-debug issues if this test is adapted/reused; since these are local variables, prefer rebinding w13/w2 to the transposed contiguous tensors (or otherwise copying safely) instead of mutating .data.
| w13.data = w13.transpose(-1, -2).contiguous() | |
| w2.data = w2.transpose(-1, -2).contiguous() | |
| w13 = w13.transpose(-1, -2).contiguous() | |
| w2 = w2.transpose(-1, -2).contiguous() |
|
we probably should add some contiguous/stride check here to ensure correctness. |
jikunshang
left a comment
There was a problem hiding this comment.
LGTM. while I want to on hold this at least this week, to not block weekly test.
Sure. I can add check later. |
align with https://github.com/intel-innersource/applications.ai.gpu.vllm-xpu/pull/145