Skip to content

Conversation

yiz-liu
Copy link
Collaborator

@yiz-liu yiz-liu commented Sep 5, 2025

What this PR does / why we need it?

Removes the condition that skips metadata synchronization when enforce_eager is enabled.

This change is necessary to correctly sync the with_prefill and enable_dbo flags across all data parallel ranks, which is not required in the base implementation. Forcing the sync operation prevents potential inconsistencies, albeit with a minor performance impact.

Does this PR introduce any user-facing change?

None.

How was this patch tested?

Add a E2E online test case?

Removes the condition that skips metadata synchronization when `enforce_eager` is enabled.

This change is necessary to correctly sync the `with_prefill` and `enable_dbo` flags across all data parallel ranks, which is not required in the base implementation. Forcing the sync operation prevents potential inconsistencies, albeit with a minor performance impact.

Signed-off-by: Yizhou Liu <[email protected]>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses a synchronization issue in data parallel eager mode by ensuring critical metadata flags are synced across ranks. The change is a necessary fix for correctness on Ascend hardware. I've suggested a minor refinement to the code comment to improve clarity and precision for future maintainability. Additionally, as noted in the PR description, adding an E2E test case to validate this behavior would be highly beneficial.

Comment on lines +584 to +589
# TODO: In vLLM, the only thing that needs to be synced is num_tokens, but in
# our case, we still need to sync the other two flags as well. So we need to
# include them in the all_reduce operation, and more over, we CANNOT skip it
# even if we are running in eager mode, which harms performance.
# FIXME: Restore the `or self.vllm_config.model_config.enforce_eager` here
# immediately once the other two flags are no longer needed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current comment is slightly imprecise and could be confusing. In upstream vLLM, metadata sync is skipped entirely in eager mode, whereas this change correctly enables it for Ascend to sync necessary flags. Suggest rephrasing for clarity and to more accurately reflect the context, which is important for future maintenance of this temporary fix.

Suggested change
# TODO: In vLLM, the only thing that needs to be synced is num_tokens, but in
# our case, we still need to sync the other two flags as well. So we need to
# include them in the all_reduce operation, and more over, we CANNOT skip it
# even if we are running in eager mode, which harms performance.
# FIXME: Restore the `or self.vllm_config.model_config.enforce_eager` here
# immediately once the other two flags are no longer needed.
# TODO: Unlike upstream vLLM which skips metadata sync in eager mode,
# we must sync `with_prefill` and `enable_dbo` flags across DP ranks
# for correctness on Ascend. This has a minor performance impact.
# FIXME: Restore the `or self.vllm_config.model_config.enforce_eager`
# check once these flags no longer need to be synced.

Copy link

github-actions bot commented Sep 5, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link

codecov bot commented Sep 5, 2025

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 72.90%. Comparing base (4c90fa7) to head (fadf917).
⚠️ Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
vllm_ascend/worker/model_runner_v1.py 0.00% 1 Missing ⚠️

❌ Your patch status has failed because the patch coverage (0.00%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2766      +/-   ##
==========================================
- Coverage   72.99%   72.90%   -0.10%     
==========================================
  Files         153      153              
  Lines       21331    21398      +67     
==========================================
+ Hits        15571    15600      +29     
- Misses       5760     5798      +38     
Flag Coverage Δ
unittests 72.90% <0.00%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@wangxiyuan wangxiyuan merged commit c735bb0 into vllm-project:main Sep 8, 2025
30 of 33 checks passed
@yiz-liu yiz-liu deleted the fix-eager-dp branch September 8, 2025 01:57
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Sep 10, 2025
…t#2766)

### What this PR does / why we need it?
Removes the condition that skips metadata synchronization when
`enforce_eager` is enabled.

This change is necessary to correctly sync the `with_prefill` and
`enable_dbo` flags across all data parallel ranks, which is not required
in the base implementation. Forcing the sync operation prevents
potential inconsistencies, albeit with a minor performance impact.

### Does this PR introduce _any_ user-facing change?
None.

### How was this patch tested?
Add a E2E online test case?

- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@e599e2c

Signed-off-by: Yizhou Liu <[email protected]>
offline893 pushed a commit to offline893/vllm-ascend that referenced this pull request Sep 16, 2025
…t#2766)

### What this PR does / why we need it?
Removes the condition that skips metadata synchronization when
`enforce_eager` is enabled.

This change is necessary to correctly sync the `with_prefill` and
`enable_dbo` flags across all data parallel ranks, which is not required
in the base implementation. Forcing the sync operation prevents
potential inconsistencies, albeit with a minor performance impact.

### Does this PR introduce _any_ user-facing change?
None.

### How was this patch tested?
Add a E2E online test case?

- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@e599e2c

Signed-off-by: Yizhou Liu <[email protected]>
Signed-off-by: offline0806 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants