Qwen3-VL-8B推理问题

"""
CUDA_VISIBLE_DEVICES=0,1,2,3 swift infer \
    --model /QwenVL/Qwen3-VL-8B-Instruct \
    --infer_backend pt \
    --max_new_tokens 512\
    --result_path ""\
    --val_dataset ""\
    --temperature 0.7\
    --max_batch_size 24\
    --model_type qwen3_vl
"""

在4卡A100/8卡H20中切换max_batch_size为1和24都无法加速推理，而且当max_batch_size为24时，随着推理到800多条的时候会爆显存。
在增加显卡和增大max_batch_size时都无法加速推理，请问应该如何加速推理？可以给我一个具体的命令吗


torch 2.6.0+cu124
swift 3.12.0
transformers 4.57.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen3-VL-8B推理问题 #7335

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen3-VL-8B推理问题 #7335

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions