Update on "New multi-step QAT API"

andrewor14 · andrewor14 · commit 7147dcb5fa2b · 2025-07-31T14:06:44.000-07:00
**Summary:** This commit adds a new multi-step QAT API with the
main goal of simplifying the existing UX. The new API uses the
same `QATConfig` for both the prepare and convert steps, and
automatically infers the fake quantization configs based on
a PTQ base config provided by the user:

```Py
from torchao.quantization import (
    quantize_,
    Int8DynamicActivationInt4WeightConfig
)
from torchao.quantization.qat import QATConfig

\# prepare
base_config = Int8DynamicActivationInt4WeightConfig(group_size=32)
qat_config = QATConfig(base_config, step="prepare")
quantize_(m, qat_config)

\# train (not shown)

\# convert
quantize_(m, QATConfig(base_config, step="convert"))
```

The main improvements include:
- A single config for both prepare and convert steps
- A single quantize_ for convert (instead of 2)
- No chance for incompatible prepare vs convert configs
- Much less boilerplate code for most common use case
- Simpler config names

For less common use cases such as experimentation, users can
still specify arbitrary fake quantization configs for
activations and/or weights as before. This is still important
since there may not always be a corresponding PTQ base config.
For example:

```Py
from torchao.quantization import quantize_
from torchao.quantization.qat import IntxFakeQuantizeConfig, QATConfig

activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
qat_config = QATConfig(
    activation_config=activation_config,
    weight_config=weight_config,
    step="prepare",
)
quantize_(model, qat_config)

\# train and convert same as above (not shown)
```

**BC-breaking notes:** This change by itself is technically not
BC-breaking since we keep around the old path, but will become
so when we deprecate and remove the old path in the future.

Before:
```Py
\# prepare
activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
qat_config = IntXQuantizationAwareTrainingConfig(activation_config, weight_config),
quantize_(model, qat_config)

\# train (not shown)

\# convert
quantize_(model, FromIntXQuantizationAwareTrainingConfig())
quantize_(model, Int8DynamicActivationInt4WeightConfig(group_size=32))
```

After: (see above)

**Test Plan:**
```
python test/quantization/test_qat.py
```

[ghstack-poisoned]
diff --git a/torchao/quantization/qat/fake_quantize_config.py b/torchao/quantization/qat/fake_quantize_config.py
@@ -26,7 +26,6 @@
 )
 
 
-@dataclass
 class FakeQuantizeConfigBase(abc.ABC):
     """
     Base class for representing fake quantization config.

Original file line number	Diff line number	Diff line change
`@@ -26,7 +26,6 @@`
`26`	`26`	`)`
`27`	`27`
`28`	`28`
`29`		`-@dataclass`
`30`	`29`	`class FakeQuantizeConfigBase(abc.ABC):`
`31`	`30`	`"""`
`32`	`31`	`Base class for representing fake quantization config.`