Skip to content

Commit 1f14355

Browse files
huachenheliywang96
authored andcommitted
[Core] Asynchronous h2d in merge_multimodal_embeddings via pinned memory. (vllm-project#23686)
Signed-off-by: Chenheli Hua <[email protected]> Co-authored-by: Roger Wang <[email protected]>
1 parent 837d9c9 commit 1f14355

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

vllm/model_executor/models/utils.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -508,7 +508,9 @@ def merge_multimodal_embeddings(
508508
"""
509509
if isinstance(placeholder_token_id, list):
510510
placeholder_token_id = torch.tensor(placeholder_token_id,
511-
device=input_ids.device)
511+
pin_memory=True).to(
512+
device=input_ids.device,
513+
non_blocking=True)
512514
return _merge_multimodal_embeddings(
513515
inputs_embeds,
514516
torch.isin(input_ids, placeholder_token_id),

0 commit comments

Comments
 (0)