Qwen3.5 27B叠加lora，推理很慢

### Checklist / 检查清单

- [x] I have searched existing issues, and this is a new bug report. / 我已经搜索过现有的 issues，确认这是一个新的 bug report。

### Bug Description / Bug 描述

命令是这样：
PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' IMAGE_MAX_TOKEN_NUM=1024 VIDEO_MAX_TOKEN_NUM=128 FPS_MAX_FRAMES=16 swift infer     --adapters output/v24-20260304-075015/checkpoint-900     --val_dataset data/test.json     --merge_lora true     --max-new-tokens 512     --temperature 0     --top-p 1.0
用Qwen 2.5和3.0时这个命令推理是非常快的，只有20分钟左右（1000个数据）。现在用3.5推理需要10个小时。

### How to Reproduce / 如何复现

训好lora，然后推理即可。

### Additional Information / 补充信息

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen3.5 27B叠加lora，推理很慢 #8225

Checklist / 检查清单

Bug Description / Bug 描述

How to Reproduce / 如何复现

Additional Information / 补充信息

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen3.5 27B叠加lora，推理很慢 #8225

Description

Checklist / 检查清单

Bug Description / Bug 描述

How to Reproduce / 如何复现

Additional Information / 补充信息

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions