Skip to content

Commit 8428e0c

Browse files
authored
[Bugfix] Fix MTP support for lmhead_tensor_parallel_size (vllm-project#3915)
### What this PR does / why we need it? Fix the issue of MTP being enabled and setting Imhead_tensor_parallel_size=16 causing the inference to hang. - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@83f478b Signed-off-by: wyh145 <[email protected]> Signed-off-by: luolun <[email protected]>
1 parent c497b9e commit 8428e0c

File tree

2 files changed

+3
-2
lines changed

2 files changed

+3
-2
lines changed

vllm_ascend/ops/vocab_parallel_embedding.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ def __init__(self,
5151
prefix: str = ""):
5252
nn.Module.__init__(self)
5353

54-
if lmhead_tp_enable() and prefix.find("lm_head") != -1:
54+
if lmhead_tp_enable() and prefix.find("head") != -1:
5555
self.comm_group = get_lmhead_tp_group()
5656
else:
5757
self.comm_group = get_tp_group()

vllm_ascend/worker/model_runner_v1.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2913,7 +2913,8 @@ def dummy_compute_logits(hidden_states):
29132913
aclgraph_runtime_mode=aclgraph_runtime_mode,
29142914
batch_descriptor=batch_descriptor)
29152915
if need_dummy_logits:
2916-
dummy_compute_logits(hidden_states)
2916+
self.drafter.model.compute_logits(
2917+
hidden_states[dummy_indices])
29172918
if self.in_profile_run and self.dynamic_eplb:
29182919
self.model.clear_all_moe_loads()
29192920
if not self.in_profile_run and self.dynamic_eplb:

0 commit comments

Comments
 (0)