Skip to content

Commit de4d673

Browse files
committed
Use correct number of encoder tokens in attention metadata
Signed-off-by: Russell Bryant <[email protected]>
1 parent 940d345 commit de4d673

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

vllm/v1/worker/gpu_model_runner.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -911,7 +911,7 @@ def _dummy_blk_table_and_slot_mapping():
911911
dtype=torch.int32,
912912
device="cpu")
913913
# NOTE - using max_encoder_len is whisper specific
914-
total_num_scheduled_tokens_arg = self.max_encoder_len
914+
total_num_scheduled_tokens_arg = num_encoder_tokens
915915
max_num_scheduled_tokens_arg = self.max_encoder_len
916916
max_seq_len_arg = self.max_encoder_len
917917
elif isinstance(kv_cache_group_spec.kv_cache_spec,

0 commit comments

Comments
 (0)