[Refactor] optimize sample_recover method in reject_sampler #3727

lio1226 · 2025-10-24T11:40:28Z

What this PR does / why we need it?

We optimized the sample_recovered_tokens_pytorch method reject_sampler and improve the performance of eagle-3.

Does this PR introduce any user-facing change?

How was this patch tested?

None

Co-authored-by: QilaiZhang ([email protected] )

vLLM version: v0.11.0rc3
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

vLLM version: v0.11.0rc3
vLLM main: vllm-project/vllm@17c540a

github-actions · 2025-10-24T11:40:42Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

The pull request optimizes the sample_recovered_tokens_pytorch method in rejection_sampler.py to improve the performance of eagle-3. The optimization replaces the nested loops with vectorized operations using torch functions, which should reduce the execution time. I have identified a potential issue related to the indexing of q_values.

gemini-code-assist · 2025-10-24T11:41:44Z

vllm_ascend/sample/rejection_sampler.py

-
-            recovered_id = torch.argmax(prob / q_values).item()
-            output_token_ids[token_idx] = recovered_id
+    q_values[:vocab_size] = q_value_new[token_positions, :vocab_size]


The indexing q_values[:vocab_size] might lead to incorrect behavior. q_values is initialized with the shape (num_tokens, vocab_size), and q_value_new has the shape (num_tokens, vocab_size). Therefore, assigning q_value_new[token_positions, :vocab_size] to q_values[:vocab_size] will result in q_values having only the first vocab_size rows updated, while the rest of the rows will remain -inf. This is likely not the intended behavior, as it will skew the probability distribution for tokens beyond the first vocab_size positions. Consider assigning q_value_new to q_values directly.

To fix this, you should assign the entire q_value_new to q_values without slicing. This ensures that all token positions have the correct q-values for the subsequent argmax operation.

Severity: critical

Suggested change

q_values[:vocab_size] = q_value_new[token_positions, :vocab_size]

q_values = q_value_new

Signed-off-by: lio <[email protected]>

gemini-code-assist bot reviewed Oct 24, 2025

View reviewed changes

[Refactor] optimize sample_recovered_tokens method in rejection_sampler

f128fd5

Signed-off-by: lio <[email protected]>

lio1226 force-pushed the rejection_sample_optimize_v1 branch from c3edfda to f128fd5 Compare October 24, 2025 12:04

lio1226 added 3 commits October 24, 2025 20:21

[Refactor] optimize sample_recover method in reject_sampler

161f1e2

Signed-off-by: lio <[email protected]>

[Refactor] optimize sample_recover method in reject_sampler

1eb87a0

Signed-off-by: lio <[email protected]>

[Refactor] optimize sample_recover method in reject_sampler

120cdd1

Signed-off-by: lio <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Refactor] optimize sample_recover method in reject_sampler #3727

[Refactor] optimize sample_recover method in reject_sampler #3727

lio1226 commented Oct 24, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Oct 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	q_values[:vocab_size] = q_value_new[token_positions, :vocab_size]
	q_values = q_value_new

Uh oh!

[Refactor] optimize sample_recover method in reject_sampler #3727

Are you sure you want to change the base?

[Refactor] optimize sample_recover method in reject_sampler #3727

Conversation

lio1226 commented Oct 24, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Oct 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lio1226 commented Oct 24, 2025 •

edited by github-actions bot

Loading