[Fixbug] Fix soc_version for 310p #2676

zhangxinyuehfad · 2025-09-01T08:59:40Z

What this PR does / why we need it?

fix soc_version for 310p
refactor _build_info and add ascend_soc_version(A2, A3, 310P) into _build_info
set default SOC_VERSION(ASCEND910B1, Ascend910_9392, ASCEND310P3) for ascend_soc_version

(EngineCore_0 pid=7454)
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700] EngineCore failed to start.
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700] Traceback (most recent call last):
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700]   File "/__w/vllm-benchmarks/vllm-benchmarks/vllm-empty/vllm/v1/engine/core.py", line 691, in run_engine_core
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700]     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700]   File "/__w/vllm-benchmarks/vllm-benchmarks/vllm-empty/vllm/v1/engine/core.py", line 492, in __init__
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700]     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700]   File "/__w/vllm-benchmarks/vllm-benchmarks/vllm-empty/vllm/v1/engine/core.py", line 89, in __init__
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700]     self._initialize_kv_caches(vllm_config)
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700]   File "/__w/vllm-benchmarks/vllm-benchmarks/vllm-empty/vllm/v1/engine/core.py", line 179, in _initialize_kv_caches
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700]     self.model_executor.determine_available_memory())
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700]   File "/__w/vllm-benchmarks/vllm-benchmarks/vllm-empty/vllm/v1/executor/abstract.py", line 76, in determine_available_memory
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700]     output = self.collective_rpc("determine_available_memory")
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700]   File "/__w/vllm-benchmarks/vllm-benchmarks/vllm-empty/vllm/executor/uniproc_executor.py", line 58, in collective_rpc
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700]     answer = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700]   File "/__w/vllm-benchmarks/vllm-benchmarks/vllm-empty/vllm/utils/__init__.py", line 3007, in run_method
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700]     return func(*args, **kwargs)
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700]   File "/__w/vllm-benchmarks/vllm-benchmarks/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 161, in determine_available_memory
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700]     self.model_runner.profile_run()
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700]   File "/__w/vllm-benchmarks/vllm-benchmarks/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2163, in profile_run
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700]     hidden_states = self._dummy_run(self.max_num_tokens,
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700]     return func(*args, **kwargs)
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700]   File "/__w/vllm-benchmarks/vllm-benchmarks/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2015, in _dummy_run
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700]     moe_comm_method = self._select_moe_comm_method(num_tokens)
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700]   File "/__w/vllm-benchmarks/vllm-benchmarks/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1633, in _select_moe_comm_method
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700]     raise ValueError(f"Unsupported soc_version: {soc_version}")
(EngineCore_0 pid=7454) ERROR 09-01 08:01:59 [core.py:700] ValueError: Unsupported soc_version: AscendSocVersion.UNDEFINED

Does this PR introduce any user-facing change?

Users can use 310p nomarlly

How was this patch tested?

vLLM version: v0.10.2
vLLM main: vllm-project/vllm@1a0a04d

gemini-code-assist

Code Review

This pull request correctly fixes a bug for Ascend 310p devices by adding support for its soc_version. The changes properly identify the new soc_version and configure the appropriate MoE communication method. I have one high-severity suggestion to improve maintainability by replacing a magic number with a named constant.

vllm_ascend/utils.py

github-actions · 2025-09-01T10:00:19Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

wangxiyuan · 2025-09-01T10:26:36Z

I don't like this way, how about refactor to build_info way totally?

codecov · 2025-09-01T10:50:35Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.84%. Comparing base (2693196) to head (5fc0f77).
⚠️ Report is 24 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2676      +/-   ##
==========================================
+ Coverage   72.61%   73.84%   +1.22%     
==========================================
  Files         154      155       +1     
  Lines       21319    21338      +19     
==========================================
+ Hits        15480    15756     +276     
+ Misses       5839     5582     -257

Flag	Coverage Δ
unittests	`73.84% <100.00%> (+1.22%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-09-05T01:27:21Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

AlphaINF · 2025-09-07T17:44:43Z

same issue! Want this branch merge!

zhangxinyuehfad · 2025-09-08T06:50:10Z

I don't like this way, how about refactor to build_info way totally?

It's impossible to import torch_npu and using get_soc_version() in the isolated environment, We need to get ascend_soc_version in the run phase like before。

github-actions · 2025-09-11T03:32:31Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-09-19T03:08:58Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: hfadzxy <[email protected]>

github-actions · 2025-09-20T09:40:08Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

gemini-code-assist bot reviewed Sep 1, 2025

View reviewed changes

vllm_ascend/utils.py Outdated Show resolved Hide resolved

vllm-ascend-ci added ready-for-test start test by label for PR e2e-310p-test labels Sep 1, 2025

zhangxinyuehfad force-pushed the zxy_fix branch from 1cfe08d to 534cce2 Compare September 1, 2025 09:09

vllm-ascend-ci added e2e-310p-test and removed e2e-310p-test labels Sep 1, 2025

github-actions bot added the module:core label Sep 1, 2025

github-actions bot added the merge-conflicts label Sep 5, 2025

Yikun mentioned this pull request Sep 8, 2025

[Bug]: Can't using 300I duo on v0.10.1rc1-310p due to ValueError: Unsupported soc_version: AscendSocVersion.UNDEFINED #2795

Open

zhangxinyuehfad force-pushed the zxy_fix branch from 534cce2 to f8c9848 Compare September 8, 2025 06:23

vllm-ascend-ci added e2e-310p-test and removed e2e-310p-test labels Sep 8, 2025

github-actions bot removed the merge-conflicts label Sep 8, 2025

zhangxinyuehfad force-pushed the zxy_fix branch 2 times, most recently from d145a4d to a4662dc Compare September 8, 2025 08:51

github-actions bot added the module:tests label Sep 8, 2025

zhangxinyuehfad force-pushed the zxy_fix branch from a4662dc to 5fc0f77 Compare September 8, 2025 12:45

github-actions bot added the merge-conflicts label Sep 11, 2025

zhangxinyuehfad force-pushed the zxy_fix branch from 5fc0f77 to 554427a Compare September 15, 2025 07:00

github-actions bot removed the merge-conflicts label Sep 15, 2025

wangxiyuan added ready-for-test start test by label for PR and removed ready-for-test start test by label for PR labels Sep 15, 2025

github-actions bot added the merge-conflicts label Sep 16, 2025

github-actions bot removed the merge-conflicts label Sep 16, 2025

zhangxinyuehfad force-pushed the zxy_fix branch 16 times, most recently from bb4559a to c002951 Compare September 19, 2025 02:57

github-actions bot added the merge-conflicts label Sep 19, 2025

[Fixbug] Fix soc_version for 310p

7e818e9

Signed-off-by: hfadzxy <[email protected]>

zhangxinyuehfad force-pushed the zxy_fix branch from c002951 to 577ac6b Compare September 19, 2025 03:38

github-actions bot removed the merge-conflicts label Sep 19, 2025

[Fixbug] Fix soc_version for 310p

244e252

Signed-off-by: hfadzxy <[email protected]>

zhangxinyuehfad force-pushed the zxy_fix branch from 577ac6b to 244e252 Compare September 19, 2025 03:48

vllm-ascend-ci added e2e-310p-test ready read for review and removed e2e-310p-test labels Sep 19, 2025

github-actions bot added merge-conflicts and removed ready read for review labels Sep 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Fixbug] Fix soc_version for 310p #2676

[Fixbug] Fix soc_version for 310p #2676

zhangxinyuehfad commented Sep 1, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

github-actions bot commented Sep 1, 2025

Uh oh!

wangxiyuan commented Sep 1, 2025

Uh oh!

codecov bot commented Sep 1, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 5, 2025

Uh oh!

AlphaINF commented Sep 7, 2025

Uh oh!

zhangxinyuehfad commented Sep 8, 2025

Uh oh!

github-actions bot commented Sep 11, 2025

Uh oh!

github-actions bot commented Sep 19, 2025

Uh oh!

github-actions bot commented Sep 20, 2025

Uh oh!

Uh oh!

[Fixbug] Fix soc_version for 310p #2676

Are you sure you want to change the base?

[Fixbug] Fix soc_version for 310p #2676

Conversation

zhangxinyuehfad commented Sep 1, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

github-actions bot commented Sep 1, 2025

Uh oh!

wangxiyuan commented Sep 1, 2025

Uh oh!

codecov bot commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Sep 5, 2025

Uh oh!

AlphaINF commented Sep 7, 2025

Uh oh!

zhangxinyuehfad commented Sep 8, 2025

Uh oh!

github-actions bot commented Sep 11, 2025

Uh oh!

github-actions bot commented Sep 19, 2025

Uh oh!

github-actions bot commented Sep 20, 2025

Uh oh!

Uh oh!

zhangxinyuehfad commented Sep 1, 2025 •

edited by github-actions bot

Loading

codecov bot commented Sep 1, 2025 •

edited

Loading