Skip to content

remove xpu_fused_moe weights handling#163

Open
mayuyuace wants to merge 1 commit intovllm-project:mainfrom
mayuyuace:qiming/remove_weights_handling
Open

remove xpu_fused_moe weights handling#163
mayuyuace wants to merge 1 commit intovllm-project:mainfrom
mayuyuace:qiming/remove_weights_handling

Conversation

@mayuyuace
Copy link
Collaborator

@mayuyuace mayuyuace commented Feb 28, 2026

Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
Copilot AI review requested due to automatic review settings February 28, 2026 03:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes implicit weight-layout mutation/caching from xpu_fused_moe and updates tests to prepare weights in the expected layout externally, while also loosening activation-name matching for the SWIGLUOAI path.

Changes:

  • Removed in-function transpose/caching for non-int4/non-mxfp4 weights and adjusted inter_size derivation based on expected layout.
  • Updated fused MoE tests to transpose w13/w2 before calling xpu_fused_moe.
  • Expanded SWIGLUOAI activation matching logic.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
vllm_xpu_kernels/fused_moe_interface.py Stops mutating weights for layout, changes inter_size inference, and adjusts SWIGLUOAI activation detection.
tests/fused_moe/test_fused_moe.py Pre-transposes weights in tests to match the new expected layout.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +160 to +164
inter_size = list(w13.shape)[-1] // 2
else:
inter_size = list(w13.shape)[-2] // 2

assert w13.is_contiguous() and w2.is_contiguous()
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xpu_fused_moe now derives inter_size from w13.shape[-1] for non-int4/non-mxfp4, which implicitly requires w13/w2 to already be in the [E, K, N] layout (transposed vs the shapes documented above and previously handled in this function). To avoid silent wrong results/crashes when callers pass the old layout, please (1) update the parameter shape docs to match the new expectation and (2) add explicit shape/layout validation (e.g., check the dimension that should equal hidden_size, and that w13.shape[-1] is divisible by 2) before calling the GEMM op.

Copilot uses AI. Check for mistakes.
elif activation == "gelu":
torch.ops._C.gelu_and_mul(act_output, gemm1_output)
elif activation == "swigluoai":
elif activation == "swigluoai" or ("SWIGLUOAI" in str(activation)):
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition ("SWIGLUOAI" in str(activation)) is overly broad: it will also match unrelated values like "NOT_SWIGLUOAI" and route them to the swiglu kernel. Safer is to normalize once (e.g., str(activation).lower()) and compare against an explicit allowlist of supported names (including any enum/string variants you intend to support).

Copilot uses AI. Check for mistakes.
Comment on lines +232 to +233
w13.data = w13.transpose(-1, -2).contiguous()
w2.data = w2.transpose(-1, -2).contiguous()
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid assigning through .data here. It bypasses autograd and can lead to hard-to-debug issues if this test is adapted/reused; since these are local variables, prefer rebinding w13/w2 to the transposed contiguous tensors (or otherwise copying safely) instead of mutating .data.

Suggested change
w13.data = w13.transpose(-1, -2).contiguous()
w2.data = w2.transpose(-1, -2).contiguous()
w13 = w13.transpose(-1, -2).contiguous()
w2 = w2.transpose(-1, -2).contiguous()

Copilot uses AI. Check for mistakes.
Comment on lines +565 to +566
w13.data = w13.transpose(-1, -2).contiguous()
w2.data = w2.transpose(-1, -2).contiguous()
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid assigning through .data here. It bypasses autograd and can lead to hard-to-debug issues if this test is adapted/reused; since these are local variables, prefer rebinding w13/w2 to the transposed contiguous tensors (or otherwise copying safely) instead of mutating .data.

Suggested change
w13.data = w13.transpose(-1, -2).contiguous()
w2.data = w2.transpose(-1, -2).contiguous()
w13 = w13.transpose(-1, -2).contiguous()
w2 = w2.transpose(-1, -2).contiguous()

Copilot uses AI. Check for mistakes.
@Liangliang-Ma
Copy link
Collaborator

we probably should add some contiguous/stride check here to ensure correctness.

Copy link
Collaborator

@jikunshang jikunshang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. while I want to on hold this at least this week, to not block weekly test.

@mayuyuace
Copy link
Collaborator Author

we probably should add some contiguous/stride check here to ensure correctness.

Sure. I can add check later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants