Skip to content

AWQ - Compressed tensors LoRa finetuning lead to OOM while GEMM AWQ works normally #437

@bi1101

Description

@bi1101

Hi, I'm trying to fine-tune a Qwen3 Moe AWQ model and found a big difference between AWQ compressed-tensor models & AWQ GEMM models

Is this a bug in compressed tensor implementation?

Hardware: 6xA6000 = 288GB OOM with compressed tensor

while 1x A6000= 48Gb works perfectly with AWQ GEMM

Here's my SFT config:

# GEMM AWQ works no OOM
model_name_or_path: ELVISIO/Qwen3-30B-A3B-Instruct-2507-AWQ 

# Compressed-tensor AWQ get OOM
model_name_or_path: cpatonn/Qwen3-30B-A3B-Instruct-2507-AWQ-4bit

# dataset
dataset_name: ...


# Lora
use_peft: True
lora_target_modules: 
  - "q_proj"
  - "k_proj"
  - "v_proj"
  - "o_proj"

# training
learning_rate: 2.0e-05
num_train_epochs: 1
packing: true
per_device_train_batch_size: 1
per_device_eval_batch_size: 1
gradient_accumulation_steps: 16
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: true
logging_steps: 1
logging_strategy: "steps"
log_level: "info"
max_length: 8000
warmup_ratio: 0.03
lr_scheduler_type: 'cosine'
bf16: true
bf16_full_eval: true
fp16: false
attn_implementation: 'flash_attention_2'

Note: In serving with vllm, compressed-tensor models seem to have better throughput. So if possible, fine-tuning with compressed-tensor format is better & more future-proof since autoawq is deprecated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions