-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Open
Labels
Description
Describe the bug
在910B3设备中使用--optim adamw_torch_npu_fused报错

Your hardware and system info
NPU:910B3
CANN:8.3.RC1
torch:2.8.0
torch-npu:2.8.0
transformers:4.57.3
ms-swift:最新main分支
deepspeed:0.17.6
系统:openEuler 24.03
Additional context
执行的bash如下:
#!/bin/bash
source /usr/local/Ascend/ascend-toolkit/set_env.sh
source /usr/local/Ascend/nnal/atb/set_env.sh
# 激活conda环境
source /root/miniforge3/etc/profile.d/conda.sh
conda activate swift
# NPU环境变量
export TORCH_DEVICE_BACKEND_AUTOLOAD="0"
export TASK_QUEUE_ENABLE=2
export CPU_AFFINITY_CONF=2
export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"
export HCCL_CONNECT_TIMEOUT=7200
DATASET_ROOT=xxx
OUTPUT_DIR=xxx
NPROC_PER_NODE=8 \
ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
swift pt \
--model /home/jmrh/data/idata_llm/models/base/Qwen3-0.6B-Base \
--train_type full \
--dataset ${DATASET_ROOT}/s1/general_data_stage1_000.jsonl ${DATASET_ROOT}/s1/general_data_stage1_001.jsonl \
--torch_dtype bfloat16 \
--num_train_epochs 1 \
--per_device_train_batch_size 5 \
--per_device_eval_batch_size 1 \
--learning_rate 5e-4 \
--gradient_accumulation_steps 128 \
--save_steps 100 \
--save_total_limit 100 \
--logging_steps 10 \
--max_length 4096 \
--save_only_model true \
--warmup_ratio 0.1 \
--dataloader_num_workers 32 \
--dataset_num_proc 256 \
--dataloader_persistent_workers true \
--deepspeed zero2 \
--ddp_backend hccl \
--attn_impl sdpa \
--metric 'ppl' \
--truncation_strategy split \
--output_dir ${OUTPUT_DIR} \
--report_to all \
--optim adamw_torch_npu_fused