Skip to content

Extend SmoothQuant quantizer to support per-module applier control #320

@mhs4670go

Description

@mhs4670go

Motivation

Currently, SmoothQuant smoothing is applied by trying a fixed list of appliers (LLaMA, fairseq, …) in order, stopping at the first success. This design is too rigid in practice:

  • Sometimes we only want to apply smoothing to specific modules (e.g., decoder layers but not embeddings).
  • Sometimes we want to apply multiple appliers sequentially to the same module (e.g., first LayerNorm–QKV smoothing, then ReLU bridge fusion).
  • Sometimes we want to override the alpha factor or applier set for individual modules without affecting the rest of the model.

Without these knobs, users must either modify the applier list globally or patch code manually, which is error-prone and inflexible.

Proposed Design

Introduce a flexible interface for controlling SmoothQuant application:

1. Global controls

  • only_appliers, skip_appliers → restrict the global set of appliers
  • include, exclude → restrict which module names are considered

2. Per-module overrides (per_module dict)

  • Key: module name or glob pattern ("model.layers.7.*")
  • Values:
    • "alpha" → custom alpha for that module
    • "appliers" → explicit list of appliers to apply sequentially

3. Observer filtering

  • observe_include, observe_exclude to limit which modules are hooked for activation statistics

Usage Examples

  1. Apply multiple appliers to each decoder layer, in order
cfg = SmoothQuantConfig(
    include=["model.layers.*"],
    per_module={
        "model.layers.*": {
            "appliers": ["ln_to_qkv", "relu_bridge"],
            "alpha": 0.6,
        },
    },
)
  1. Apply only one applier globally, but add another as well for decoder layers
cfg = SmoothQuantConfig(
    only_appliers=["ln_to_qkv"],   # global default
    per_module={
        "decoder.layers.*": {
            "appliers": ["ln_to_qkv", "relu_bridge"],
        },
    },
)
  1. Skip first few layers, change alpha and appliers for later layers
cfg = SmoothQuantConfig(
    include=["model.layers.*"],
    exclude=["model.layers.0.*", "model.layers.1.*", "model.layers.2.*"],
    per_module={
        "model.layers.20*": {
            "appliers": ["ln_to_qkv", "relu_bridge"],
            "alpha": 0.55,
        },
    },
)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions