AWQ - Compressed tensors LoRa finetuning lead to OOM while GEMM AWQ works normally

Hi, I'm trying to fine-tune a Qwen3 Moe AWQ model and found a big difference between AWQ compressed-tensor models & AWQ GEMM models

Is this a bug in compressed tensor implementation?

Hardware: 6xA6000 = 288GB OOM with compressed tensor

while 1x A6000= 48Gb works perfectly with AWQ GEMM

Here's my SFT config:

```yml
# GEMM AWQ works no OOM
model_name_or_path: ELVISIO/Qwen3-30B-A3B-Instruct-2507-AWQ 

# Compressed-tensor AWQ get OOM
model_name_or_path: cpatonn/Qwen3-30B-A3B-Instruct-2507-AWQ-4bit

# dataset
dataset_name: ...


# Lora
use_peft: True
lora_target_modules: 
  - "q_proj"
  - "k_proj"
  - "v_proj"
  - "o_proj"

# training
learning_rate: 2.0e-05
num_train_epochs: 1
packing: true
per_device_train_batch_size: 1
per_device_eval_batch_size: 1
gradient_accumulation_steps: 16
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: true
logging_steps: 1
logging_strategy: "steps"
log_level: "info"
max_length: 8000
warmup_ratio: 0.03
lr_scheduler_type: 'cosine'
bf16: true
bf16_full_eval: true
fp16: false
attn_implementation: 'flash_attention_2'
```

Note: In serving with vllm, compressed-tensor models seem to have better throughput. So if possible, fine-tuning with compressed-tensor format is better & more future-proof since autoawq is deprecated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AWQ - Compressed tensors LoRa finetuning lead to OOM while GEMM AWQ works normally #437

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AWQ - Compressed tensors LoRa finetuning lead to OOM while GEMM AWQ works normally #437

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions