[Doc]Add developer guide of eplb. #3759

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

offline893 wants to merge 3 commits into vllm-project:main from offline893:main_1024

+51 −0

Contributor

offline893 commented Oct 25, 2025 •

edited by github-actions bot

Loading

What this PR does / why we need it?

Add developer guide of eplb

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.11.0rc3
vLLM main: vllm-project/vllm@c9461e0

offline0806 added 2 commits

October 24, 2025 17:11


          [Doc]Add eplb feature guide to developer guide.

d286c56

Signed-off-by: offline0806 <[email protected]>


          [Doc]Add developer guide of eplb.

feebf7e

Signed-off-by: offline0806 <[email protected]>

github-actions bot commented Oct 25, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

github-actions bot added the documentation label

gemini-code-assist bot reviewed

View reviewed changes

Contributor

gemini-code-assist bot left a comment

Code Review

This pull request introduces a new developer guide for the Expert Parallelism Load Balancer (EPLB). While adding documentation is a valuable contribution, the current version of the guide contains several significant errors that could mislead developers. I have identified issues such as references to non-existent parameters, incorrect file paths, incorrect language-specific terminology (using try-catch for Python), and a typo in a critical environment variable. Correcting these is important for the documentation to be accurate and useful. My review provides specific suggestions for each of these points.

docs/source/developer_guide/feature_guide/eplb_swift_balancer.md Outdated

+              In other cases, we use the global load balancing policy, which replicates experts globally regardless of expert groups, and packs the replicated experts onto individual GPUs. This policy can be adopted in the decoding stage with a larger expert-parallel size.
+              ### Add a New MoE Model
+              When adding a new model, inherit or modify `VllmEplbAdaptor`. Add the processing logic for `num_dense_layers`, `global_expert_num`, and `num_roe_layers`, and synchronize the relevant logic within the `model_register` function.

Contributor

gemini-code-assist bot Oct 25, 2025

The documentation mentions num_roe_layers as a parameter to handle when adding a new model. However, this parameter does not appear to be used in the related implementation files (vllm_ascend/eplb/adaptor/vllm_adaptor.py, vllm_ascend/eplb/utils.py). This is misleading for developers. Please remove it or clarify its purpose.

Suggested change

      
            When adding a new model, inherit or modify `VllmEplbAdaptor`. Add the processing logic for `num_dense_layers`, `global_expert_num`, and `num_roe_layers`, and synchronize the relevant logic within the `model_register` function.
          
            When adding a new model, inherit or modify `VllmEplbAdaptor`. Add the processing logic for `num_dense_layers` and `global_expert_num`, and synchronize the relevant logic within the `model_register` function.

docs/source/developer_guide/feature_guide/eplb_swift_balancer.md Outdated

+              ### Add a New MoE Model
+              When adding a new model, inherit or modify `VllmEplbAdaptor`. Add the processing logic for `num_dense_layers`, `global_expert_num`, and `num_roe_layers`, and synchronize the relevant logic within the `model_register` function.
+              If you want to add MoE-related processing to the model, add corresponding methods to `VLLM/EPLB/utils` and add patch logic in the `model_register` function.

Contributor

gemini-code-assist bot Oct 25, 2025

The file path VLLM/EPLB/utils is incorrect. Based on the project structure, the correct path appears to be vllm_ascend/eplb/utils.py. Providing an incorrect path can mislead developers trying to extend the functionality.

Suggested change

      
            If you want to add MoE-related processing to the model, add corresponding methods to `VLLM/EPLB/utils` and add patch logic in the `model_register` function.
          
            If you want to add MoE-related processing to the model, add corresponding methods to `vllm_ascend/eplb/utils.py` and add patch logic in the `model_register` function.

docs/source/developer_guide/feature_guide/eplb_swift_balancer.md

+              All EPLB parameters must be initialized by default during initialization, with specified parameter types and default values for proper handling.
+              #### General Functions
+              All method arguments must specify parameter types and default values, and functions must include default return value handling for default arguments. It is recommended to use `try-catch` blocks to handle the function body, specifying the type of exception captured and the failure handling (e.g., logging exceptions or returning a failure status).

Contributor

gemini-code-assist bot Oct 25, 2025

The documentation recommends using try-catch blocks for exception handling. However, in Python, the correct syntax is try-except. This should be corrected to avoid confusion for Python developers.

Suggested change

      
            All method arguments must specify parameter types and default values, and functions must include default return value handling for default arguments. It is recommended to use `try-catch` blocks to handle the function body, specifying the type of exception captured and the failure handling (e.g., logging exceptions or returning a failure status).
          
            All method arguments must specify parameter types and default values, and functions must include default return value handling for default arguments. It is recommended to use `try-except` blocks to handle the function body, specifying the type of exception captured and the failure handling (e.g., logging exceptions or returning a failure status).

docs/source/developer_guide/feature_guide/eplb_swift_balancer.md

+              ## Limitation
+              Before using EPLB, start the script and add `export DYNAMIC_EPLB="true"`.
+              Before performing load data collection (or performance data collection), start the script and add `export EXPORT_MAP_RECORD="true"`.

Contributor

gemini-code-assist bot Oct 25, 2025

There is a typo in the environment variable name. The documentation specifies EXPORT_MAP_RECORD, but the code in vllm_ascend/eplb/core/eplb_utils.py checks for EXPERT_MAP_RECORD. A developer following the documentation would encounter an error. This should be corrected.

Suggested change

      
            Before performing load data collection (or performance data collection), start the script and add `export EXPORT_MAP_RECORD="true"`.
          
            Before performing load data collection (or performance data collection), start the script and add `export EXPERT_MAP_RECORD="true"`.


          Doc]Add eplb developer guide to index.

7e0c937

Signed-off-by: offline0806 <[email protected]>

wangxiyuan reviewed

View reviewed changes

docs/source/developer_guide/feature_guide/eplb_swift_balancer.md

    
              # Expert Parallelism Load Balancer (EPLB)

              ## Why We Need EPLB?

              When using Expert Parallelism (EP), different experts are assigned to different GPUs/NPUs. Given that the load of various experts may vary depending on the current workload, it is crucial to maintain balanced loads across different GPUs/NPUs. We adopt a redundant experts strategy by duplicating heavily-loaded experts. Then, we heuristically pack these duplicated experts onto GPUs to ensure load balancing across them. Moreover, thanks to the group-limited expert routing used in MoE models, we also attempt to place experts of the same group on the same node to reduce inter-node data traffic, whenever possible.

Collaborator

wangxiyuan Oct 25, 2025

change all GPU to NPU

docs/source/developer_guide/feature_guide/eplb_swift_balancer.md

    
              ## Why We Need EPLB?

              When using Expert Parallelism (EP), different experts are assigned to different GPUs/NPUs. Given that the load of various experts may vary depending on the current workload, it is crucial to maintain balanced loads across different GPUs/NPUs. We adopt a redundant experts strategy by duplicating heavily-loaded experts. Then, we heuristically pack these duplicated experts onto GPUs to ensure load balancing across them. Moreover, thanks to the group-limited expert routing used in MoE models, we also attempt to place experts of the same group on the same node to reduce inter-node data traffic, whenever possible.

              To facilitate reproduction and deployment, we open-source our deployed EP load balancing algorithm in `vllm_ascend/eplb/core/policy`. The algorithm computes a balanced expert replication and placement plan based on the estimated expert loads. Note that the exact method for predicting expert loads is outside the scope of this repository. A common method is to use a moving average of historical statistics.

Collaborator

wangxiyuan Oct 25, 2025

no need to mention open-source. It can be somthing like: vLLM Ascend supported xxx

docs/source/developer_guide/feature_guide/eplb_swift_balancer.md

    
              Please refer to the EPLB section of the user guide for detailed information: [How to Use EPLB](../../user_guide/feature_guide/eplb_swift_balancer.md)

              ## How It Works?

Collaborator

wangxiyuan Oct 25, 2025

please add more about the module design. For example, what's EplbUpdator, EplbWorker, etc, and how they work?

docs/source/developer_guide/feature_guide/eplb_swift_balancer.md

    
              ## How It Works?

              ### Default Algorithm

              #### Hierarchical Load Balancing

Collaborator

wangxiyuan Oct 25, 2025

please add a section to descrbie how to register a new algorithm for developers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels