Skip to content

Conversation

Liccol
Copy link
Contributor

@Liccol Liccol commented Sep 4, 2025

What this PR does / why we need it?

For offline scenarios, adjust the scheduling process to prioritize the prefill phase of all requests, then process the decode phase of all requests.

Does this PR introduce any user-facing change?

How was this patch tested?

max_num_seqs=24,
additional_config={
    "ascend_scheduler_config":{
        "enabled": True,
        "enable_pd_transfer": True,
        "decode_max_num_seqs": 24,
        "enable_chunked_prefill": False
    }
},
input output num prompts max_num_seqs dp tp scheduler tps
dapo-math-17K 2K 384 24 2 1 v1 234.06
dapo-math-17K 2K 384 24 2 1 pd transfer 239.59(+2.4%)
dapo-math-17K 2K 384 24 4 1 v1 222.85
dapo-math-17K 2K 384 24 4 1 pd transfer 225.81(+1.3%)

Signed-off-by: CaranLic <[email protected]>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a 'pd transfer' feature for the Ascend scheduler, aiming to optimize performance by separating prefill and decode phases. The changes involve adding an enable_pd_transfer configuration, implementing a two-phase state machine ('prefill', 'decode') in the AscendScheduler, and adding unit tests to validate the new functionality. My review has identified a few areas for improvement. The phase transition logic currently only allows a one-way switch from 'prefill' to 'decode', which may be suboptimal for dynamic workloads. Additionally, the new unit test could be made more robust by avoiding manual manipulation of the scheduler's internal state. Finally, there's a minor point on configuration access that could be improved to make the code more robust against configuration errors.


def test_scheduler_with_pd_transfer(self):
scheduler = self.create_scheduler()
scheduler.phase = "prefill"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Manually setting the internal state scheduler.phase makes this test brittle and less representative of real usage. If the initialization logic in AscendScheduler.__init__ changes, this test would not catch the regression. A better approach is to initialize the scheduler with enable_pd_transfer=True in its configuration, which would correctly set the initial phase.

To achieve this, you could modify the create_scheduler helper method to accept configuration overrides. For example:

def create_scheduler(self, mock_compute_encoder_budget, scheduler_config_override: Optional[Dict[str, Any]] = None):
    # ... existing setup ...
    scheduler_config = SchedulerConfig(
        # ...
    )
    if scheduler_config_override:
        for key, value in scheduler_config_override.items():
            setattr(scheduler_config, key, value)
    # ... rest of the function ...

Then, the test can be updated to:

scheduler = self.create_scheduler(scheduler_config_override={"enable_pd_transfer": True})
self.assertEqual(scheduler.phase, "prefill")

This change would make the test more robust and also serve to verify the scheduler's initialization logic.

Comment on lines 62 to 64
enable_pd_transfer = getattr(self.scheduler_config,
'enable_pd_transfer',
False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using getattr with a default value for enable_pd_transfer can mask configuration issues. The AscendSchedulerConfig is expected to always have the enable_pd_transfer attribute. If it's missing for some reason (e.g., a typo in the attribute name in the config class), getattr will silently fall back to False, disabling the feature without any warning. This can make debugging difficult.

It's more robust to access the attribute directly. This will ensure that any configuration problem results in a clear AttributeError, which is a fail-fast approach.

        enable_pd_transfer = self.scheduler_config.enable_pd_transfer

Comment on lines +104 to +105
if not self.waiting and not self.running:
self.phase = "decode"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The phase transition from 'prefill' to 'decode' is currently one-way. Once the scheduler enters the 'decode' phase, it never returns to 'prefill'. If new requests arrive while the system is in the 'decode' phase, they will be prefilled and then immediately start decoding, which might not be the most efficient approach for the Ascend hardware this feature is targeting, as it breaks the strict batching of prefill operations.

To improve performance for dynamic workloads, consider adding logic to allow the scheduler to switch back to the 'prefill' phase. For instance, you could add a check at the beginning of the schedule method:

if self.phase == "decode" and not self.running and self.waiting:
    self.phase = "prefill"

This would ensure that if the decoding queue is empty and new requests are waiting, the scheduler can switch back to the more efficient batch prefill mode.

Signed-off-by: CaranLic <[email protected]>
Copy link

github-actions bot commented Sep 4, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: CaranLic <[email protected]>
Signed-off-by: CaranLic <[email protected]>
Signed-off-by: CaranLic <[email protected]>
Signed-off-by: CaranLic <[email protected]>
Signed-off-by: CaranLic <[email protected]>
Signed-off-by: CaranLic <[email protected]>
Signed-off-by: CaranLic <[email protected]>
@Liccol Liccol changed the title add pd transfer for ascend scheduler [main] add pd transfer for ascend scheduler Sep 5, 2025
Signed-off-by: CaranLic <[email protected]>
Signed-off-by: CaranLic <[email protected]>
@zzzzwwjj
Copy link
Collaborator

zzzzwwjj commented Sep 5, 2025

Enabling this feature will result differences in definition of max_num_seqs between our and the community. Fortunately, disabling this feature will not have any impact. The best solution is to make the community accept our plan.
@wangxiyuan cc

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Sep 8, 2025
Copy link

codecov bot commented Sep 8, 2025

Codecov Report

❌ Patch coverage is 97.50000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.90%. Comparing base (1bbb20e) to head (03949b1).

Files with missing lines Patch % Lines
vllm_ascend/worker/model_runner_v1.py 0.00% 2 Missing ⚠️
vllm_ascend/core/scheduler.py 95.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2753      +/-   ##
==========================================
+ Coverage   74.76%   74.90%   +0.13%     
==========================================
  Files         150      152       +2     
  Lines       20891    21008     +117     
==========================================
+ Hits        15620    15736     +116     
- Misses       5271     5272       +1     
Flag Coverage Δ
unittests 74.90% <97.50%> (+0.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

github-actions bot commented Sep 8, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

self.min_p = self.min_p_device[:0]


MinPLogitsProcessor.__init__ = min_p_logits_processor_init_func
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not patch vLLM, you should create ascend build_logitsprocs in npu_input_batch instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change from patch MinPLogitsProcessor init func to redefine build_logitsprocs, code in 30571f1

@wangxiyuan wangxiyuan added ready read for review ready-for-test start test by label for PR labels Sep 9, 2025
@wangxiyuan wangxiyuan merged commit 168ad60 into vllm-project:main Sep 10, 2025
41 of 43 checks passed
offline893 pushed a commit to offline893/vllm-ascend that referenced this pull request Sep 16, 2025
### What this PR does / why we need it?
For offline scenarios, adjust the scheduling process to prioritize the
prefill phase of all requests, then process the decode phase of all
requests.

### How was this patch tested?

```
max_num_seqs=24,
additional_config={
    "ascend_scheduler_config":{
        "enabled": True,
        "enable_pd_transfer": True,
        "decode_max_num_seqs": 24,
        "enable_chunked_prefill": False
    }
},
```
| input | output | num prompts | max_num_seqs | dp | tp | scheduler |
tps |
| ------ | ------ | ---------- | ---------------- | ---- | ---- |
---------------- | --------------- |
| dapo-math-17K | 2K | 384 | 24 | 2 | 1 | v1 | 234.06 |
| dapo-math-17K | 2K | 384 | 24 | 2 | 1 | pd transfer | 239.59(+2.4%) |
| dapo-math-17K| 2K | 384 | 24 | 4 | 1 | v1 | 222.85 |
| dapo-math-17K| 2K | 384 | 24 | 4 | 1 | pd transfer | 225.81(+1.3%) |

- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@6fb2788

---------

Signed-off-by: CaranLic <[email protected]>
Signed-off-by: offline0806 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation module:tests ready read for review ready-for-test start test by label for PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants