Skip to content

NPU设备使用adamw_torch_npu_fused报错 #7327

@weiliang987644015

Description

@weiliang987644015

Describe the bug
在910B3设备中使用--optim adamw_torch_npu_fused报错
Image

Your hardware and system info
NPU:910B3
CANN:8.3.RC1
torch:2.8.0
torch-npu:2.8.0
transformers:4.57.3
ms-swift:最新main分支
deepspeed:0.17.6
系统:openEuler 24.03

Additional context
执行的bash如下:

#!/bin/bash
source /usr/local/Ascend/ascend-toolkit/set_env.sh
source /usr/local/Ascend/nnal/atb/set_env.sh
# 激活conda环境
source /root/miniforge3/etc/profile.d/conda.sh
conda activate swift

# NPU环境变量
export TORCH_DEVICE_BACKEND_AUTOLOAD="0"
export TASK_QUEUE_ENABLE=2
export CPU_AFFINITY_CONF=2
export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"
export HCCL_CONNECT_TIMEOUT=7200

DATASET_ROOT=xxx
OUTPUT_DIR=xxx

NPROC_PER_NODE=8 \
ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
swift pt \
    --model /home/jmrh/data/idata_llm/models/base/Qwen3-0.6B-Base \
    --train_type full \
    --dataset ${DATASET_ROOT}/s1/general_data_stage1_000.jsonl ${DATASET_ROOT}/s1/general_data_stage1_001.jsonl \
    --torch_dtype bfloat16 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 5 \
    --per_device_eval_batch_size 1 \
    --learning_rate 5e-4 \
    --gradient_accumulation_steps 128 \
    --save_steps 100 \
    --save_total_limit 100 \
    --logging_steps 10 \
    --max_length 4096 \
    --save_only_model true \
    --warmup_ratio 0.1 \
    --dataloader_num_workers 32 \
    --dataset_num_proc 256 \
    --dataloader_persistent_workers true \
    --deepspeed zero2 \
    --ddp_backend hccl \
    --attn_impl sdpa \
    --metric 'ppl' \
    --truncation_strategy split \
    --output_dir ${OUTPUT_DIR} \
    --report_to all \
    --optim adamw_torch_npu_fused 

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingnpu

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions