Skip to content

Question about freezing parameters of VL model #7318

@Pay20Y

Description

@Pay20Y

Hi, when I SFT a Qwen2.5-VL-32B model, I just want to fine-tune the aligner, so I set

--freeze_vit true
--freeze_llm true
--freeze_aligner false

But it does not work, part of log as follows:

2026-01-07T20:23:30.137320296+08:00 freeze_aligner=True,
2026-01-07T20:23:30.137322007+08:00 freeze_llm=False,
2026-01-07T20:23:30.137324016+08:00 freeze_parameters=['model.visual', 'model.visual.merger'],
2026-01-07T20:23:30.137325741+08:00 freeze_parameters_ratio=0.0,
2026-01-07T20:23:30.137327538+08:00 freeze_parameters_regex=None,
2026-01-07T20:23:30.137329196+08:00 freeze_vit=True,

I wonder the reason, and the whole setting of my training are as follows:

PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' \
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
NNODES=$nnodes \
NODE_RANK=$node_rank \
MASTER_ADDR=$master_addr \
MASTER_PORT=$master_port \
NPROC_PER_NODE=8 \
swift sft \
    --model_type qwen2_5_vl \
    --model XXX \
    --train_type full \
    --dataset XXX \
    --load_from_cache_file true \
    --bf16 true \
    --micro_batch_size 1 \
    --global_batch_size 256 \
    --sequence_parallel_size 1 \
    --recompute_granularity full \
    --recompute_method uniform \
    --recompute_num_layers 1 \
    --split_dataset_ratio 0.0 \
    --max_epochs 1 \
    --cross_entropy_loss_fusion true \
    --lr 1e-7 \
    --lr_warmup_fraction 0.05 \
    --min_lr 1e-7 \
    --save XXX \
    --save_interval 1000 \
    --eval_interval 500 \
    --max_length 16384 \
    --num_workers 8 \
    --dataset_num_proc 512 \
    --no_save_optim true \
    --no_save_rng true \
    --attention_backend flash \
    --deepspeed zero3 \
    --freeze_llm true \
    --freeze_vit true \
    --freeze_aligner false

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions