Skip to content

Commit 42fca6f

Browse files
kaixihdiegocastanibm
authored andcommitted
[NVIDIA] Explicitly disable shuffled weights for flashinfer blockscale moe fp8 kernels (vllm-project#21411)
Signed-off-by: kaixih <[email protected]> Signed-off-by: Diego-Castan <[email protected]>
1 parent ecde5f4 commit 42fca6f

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

vllm/model_executor/layers/fused_moe/fused_moe.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1127,6 +1127,7 @@ def flashinfer_fused_moe_blockscale_fp8(
11271127
tile_tokens_dim=_get_tile_tokens_dim(x.shape[0], top_k,
11281128
global_num_experts),
11291129
routing_method_type=2, # DeepSeek-styled routing method
1130+
use_shuffled_weight=False,
11301131
)
11311132

11321133

0 commit comments

Comments
 (0)