-
Notifications
You must be signed in to change notification settings - Fork 22
Open
Description
Motivation
Currently, SmoothQuant smoothing is applied by trying a fixed list of appliers (LLaMA, fairseq, …) in order, stopping at the first success. This design is too rigid in practice:
- Sometimes we only want to apply smoothing to specific modules (e.g., decoder layers but not embeddings).
- Sometimes we want to apply multiple appliers sequentially to the same module (e.g., first LayerNorm–QKV smoothing, then ReLU bridge fusion).
- Sometimes we want to override the alpha factor or applier set for individual modules without affecting the rest of the model.
Without these knobs, users must either modify the applier list globally or patch code manually, which is error-prone and inflexible.
Proposed Design
Introduce a flexible interface for controlling SmoothQuant application:
1. Global controls
only_appliers,skip_appliers→ restrict the global set of appliersinclude,exclude→ restrict which module names are considered
2. Per-module overrides (per_module dict)
- Key: module name or glob pattern ("model.layers.7.*")
- Values:
- "alpha" → custom alpha for that module
- "appliers" → explicit list of appliers to apply sequentially
3. Observer filtering
observe_include,observe_excludeto limit which modules are hooked for activation statistics
Usage Examples
- Apply multiple appliers to each decoder layer, in order
cfg = SmoothQuantConfig(
include=["model.layers.*"],
per_module={
"model.layers.*": {
"appliers": ["ln_to_qkv", "relu_bridge"],
"alpha": 0.6,
},
},
)- Apply only one applier globally, but add another as well for decoder layers
cfg = SmoothQuantConfig(
only_appliers=["ln_to_qkv"], # global default
per_module={
"decoder.layers.*": {
"appliers": ["ln_to_qkv", "relu_bridge"],
},
},
)- Skip first few layers, change alpha and appliers for later layers
cfg = SmoothQuantConfig(
include=["model.layers.*"],
exclude=["model.layers.0.*", "model.layers.1.*", "model.layers.2.*"],
per_module={
"model.layers.20*": {
"appliers": ["ln_to_qkv", "relu_bridge"],
"alpha": 0.55,
},
},
)Metadata
Metadata
Assignees
Labels
No labels