Contents
Native support for verl on Ascend NPU has attracted the attention of some developers. This roadmap shows the progress of native support, welcome to join in the discussion.
Q1 RoadMap
The verl-NPU workflow depends on vLLM-ascend's version, so it has only been rebase on vLLM-ascend tag: v0.7.3rc1. We will continue to rebase as vLLM-ascend is updated.
Quick Start
document: ascend.rst
Plan
Dependencies (Q1 done)
Q2 Plan
Release Accuracy Comparison Results
Modify the default config as little as possible to keep the accuracy.
|
ALGO |
Model |
Result (Mean Absolute Error) |
CANN |
| ✅ |
SFT |
Qwen2.5-0.5B-Instruct[1] |
chart @as12138 |
8.1.RC1 (not release) |
|
GRPO |
Qwen2-7B-Instruct[2] |
chart @as12138 |
8.1.RC1 (not release) |
|
GRPO |
Qwen2.5-VL-3B-Instruct[1] |
WIP @as12138 |
8.1.RC1 (not release) |
|
GRPO |
Qwen2.5-VL-7B-Instruct |
Waiting for sleep mode |
8.1.RC1 (not release) |
|
GRPO |
Qwen2.5-7B-Instruct |
Waiting for sleep mode |
8.1.RC1 (not release) |
Notes:
[1] NPU currently doesn't support sleep mode (required for hybrid engine). In order to obtain results efficiently, we use 2B/3B model on 8 devices when verifying ALGO.
[2] Qwen2-7B-Instruct was tested on 2*8 devices, and many params related to batch_size need to be reduced. So this result is only for reference. We will announce the reward results of the default params as soon as sleep mode is supported.
Easy of use
flash-attn is not supported on Ascend NPU. So we need to use torch_npu.npu_fusion_attention to replace.
Long-term Planning
Contents
Native support for verl on Ascend NPU has attracted the attention of some developers. This roadmap shows the progress of native support, welcome to join in the discussion.
Q1 RoadMap
The verl-NPU workflow depends on vLLM-ascend's version, so it has only been rebase on vLLM-ascend tag: v0.7.3rc1. We will continue to rebase as vLLM-ascend is updated.
Quick Start
document: ascend.rst
Plan
Dependencies (Q1 done)
transformersrayFSDPworkervLLM-ascend v0.7.1vLLM-ascend v0.7.3(Some features have been temporarily circumvented and marked in the Q2 Plan)Q2 Plan
megatron/mindspeedworker (for npu, megatron≈mindspeed)torchtitan/FSDP2worker--use_remove_paddingRelease Accuracy Comparison Results
Modify the default config as little as possible to keep the accuracy.
sleep modesleep modeNotes:
[1] NPU currently doesn't support sleep mode (required for hybrid engine). In order to obtain results efficiently, we use 2B/3B model on 8 devices when verifying ALGO.
[2] Qwen2-7B-Instruct was tested on 2*8 devices, and many params related to batch_size need to be reduced. So this result is only for reference. We will announce the reward results of the default params as soon as sleep mode is supported.
Easy of use
flash-attnis not supported on Ascend NPU. So we need to usetorch_npu.npu_fusion_attentionto replace.[Temporary solution] NPU support SDPA: NPU support SDPA huggingface/transformers#35165Long-term Planning