Document, how does NNCF compute gradients for QAT

I was trying to figure out, what does NNCF actually do in order to make the various fake quantization functions differentiable.

The original "[Neural Network Compression Framework for fast model inference](https://arxiv.org/pdf/2002.08679)" paper claimed that

>  ... methods proposed in [12], where quantization parameters are learned using gradient descent. In our framework we use a similar quantization method, along with other quantization schemes, while also providing the ability to automatically insert Fake Quantization operations in the model graph.

where reference 12 is "[PACT: Parameterized Clipping Activation for Quantized Neural Networks](https://arxiv.org/pdf/1805.06085)".

However, it seemingly doesn't disclose which "similar quantization method" it actually uses. It's not really PACT, since PACT doesn't support "proper" asymmetric quantization, requires replacing ReLU with a custom activation function and adds an extra regularization term into the loss.

Unlike PACT, NNCF just defines custom `forward`/`backward` functions that are differentiable wrt `inputs`, `inputs_low` **and** `inputs_range` (without relying on custom activation functions or extra loss terms). The implementation for the custom `forward`/`backward` functions in question can be found in

- `src/nncf/torch/quantization/reference.py` (pure python implementation)
- `src/nncf/torch/extensions/src/quantization/cpu/functions_cpu.cpp` (CPU)
- `src/nncf/torch/extensions/src/quantization/cuda/functions_cuda_impl.cu` (CUDA)

I think that the most similar paper to what NNCF does is actually "[Learned Step Size Quantization](https://arxiv.org/pdf/1902.08153)" (aka LSQ), not PACT. Although NNCFs implementation doesn't **quite** match with LSQ.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document, how does NNCF compute gradients for QAT #3870

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Document, how does NNCF compute gradients for QAT #3870

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions