[DOC] Qwen3 PD disaggregation user guide #2751

paulyu12 · 2025-09-04T08:36:53Z

What this PR does / why we need it?

The PR is for the document of the prefiller&decoder disaggregation deloyment guide.

The scenario of the guide is:

Use 3 nodes totally and 2 NPUs on each node
Qwen3-30B-A3B
1P2D
Expert Parallel

The deployment can be used to verify PD Disggregation / Expert Parallel features with a slightly less resources.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

No.

vLLM version: v0.10.1.1
vLLM main: vllm-project/vllm@e599e2c

Signed-off-by: paulyu12 <[email protected]>

github-actions · 2025-09-04T09:18:57Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

docs/source/tutorials/multi_node_pd_disaggregation.md

wangxiyuan · 2025-09-05T01:09:32Z

@Potabk please follow this guide to test locally and review it. Thanks.

Signed-off-by: paulyu12 <[email protected]>

Potabk · 2025-09-05T01:40:58Z

@paulyu12 Have you once tried deploying all instances on the same node

docs/source/tutorials/multi_node_pd_disaggregation.md

paulyu12 · 2025-09-05T01:50:26Z

@paulyu12 Have you once tried deploying all instances on the same node

Not yet. But if you need, I can try this scenario soon.

Signed-off-by: paulyu12 <[email protected]>

docs/source/tutorials/multi_node_pd_disaggregation.md

Signed-off-by: paulyu12 <[email protected]>

docs/source/tutorials/multi_node_pd_disaggregation.md

Signed-off-by: paulyu12 <[email protected]>

docs/source/tutorials/multi_node_pd_disaggregation.md

Signed-off-by: paulyu12 <[email protected]>

wangxiyuan · 2025-09-07T02:35:32Z

Thanks for the contribution

### What this PR does / why we need it? The PR is for the document of the prefiller&decoder disaggregation deloyment guide. The scenario of the guide is: - Use 3 nodes totally and 2 NPUs on each node - Qwen3-30B-A3B - 1P2D - Expert Parallel The deployment can be used to verify PD Disggregation / Expert Parallel features with a slightly less resources. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? No. - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@e599e2c --------- Signed-off-by: paulyu12 <[email protected]>

### What this PR does / why we need it? The PR is for the document of the prefiller&decoder disaggregation deloyment guide. The scenario of the guide is: - Use 3 nodes totally and 2 NPUs on each node - Qwen3-30B-A3B - 1P2D - Expert Parallel The deployment can be used to verify PD Disggregation / Expert Parallel features with a slightly less resources. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? No. - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@e599e2c --------- Signed-off-by: paulyu12 <[email protected]> Signed-off-by: offline0806 <[email protected]>

[DOC] Qwen3 PD disaggregateion user guide

c626394

Signed-off-by: paulyu12 <[email protected]>

github-actions bot added the documentation Improvements or additions to documentation label Sep 4, 2025

paulyu12 marked this pull request as ready for review September 4, 2025 14:03

wangxiyuan reviewed Sep 5, 2025

View reviewed changes