Commit 7a82b9f
pass in tensor_id for calculate_qparam (#1709)
### Issue:
FP8_BLOCK quantization produced poor `lm_eval` results due to two
issues:
1. **Shared statistics across blocks**: All blocks used the same
`tensor_id`, causing incorrect running statistics
2. **MoE gates being quantized**: Critical routing layers were
quantized, degrading performance
### Solution
- **Fixed block statistics**: Pass unique `tensor_id=f"block_{i}_{j}"`
to `calculate_qparams` for each block
- **Updated example**: Set proper ignore layers
### Changes
- `src/llmcompressor/observers/base.py`: Added unique tensor IDs for
block-wise statistics
- `examples/quantization_w8a8_fp8/fp8_block_example.py`: Fixed ignore
patterns for MoE gates
### Test:
Produced models:
```
shanjiaz/Qwen3-30B-A3B-FP8-BLOCK
shanjiaz/Qwen3-0.6B-FP8-BLOCK
```
Quantized models now get exact same result as Michael's original
```
lm_eval --model vllm --model_args pretrained=shanjiaz/Qwen3-30B-A3B-FP8-BLOCK --trust_remote_code --tasks gsm8k --num_fewshot 5 --batch_size auto
vllm (pretrained=shanjiaz/Qwen3-30B-A3B-FP8-BLOCK,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto
```
|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.8324|± |0.0103|
| | |strict-match | 5|exact_match|↑ |0.8848|± |0.0088|
```
lm_eval --model vllm --model_args pretrained=shanjiaz/Qwen3-0.6B-FP8-BLOCK --trust_remote_code --tasks gsm8k --num_fewshot 5 --batch_size auto
vllm (pretrained=shanjiaz/Qwen3-0.6B-FP8-BLOCK,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto
```
|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.3995|± |0.0135|
---------
Signed-off-by: shanjiaz <[email protected]>
Signed-off-by: Domenic Barbuzzi <[email protected]>1 parent dbaff79 commit 7a82b9f
File tree
2 files changed
+13
-3
lines changed- examples/quantization_w8a8_fp8
- src/llmcompressor/observers
2 files changed
+13
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | | - | |
| 6 | + | |
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
| 19 | + | |
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
63 | 63 | | |
64 | 64 | | |
65 | 65 | | |
| 66 | + | |
| 67 | + | |
66 | 68 | | |
67 | 69 | | |
68 | 70 | | |
69 | 71 | | |
70 | 72 | | |
71 | 73 | | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
72 | 78 | | |
73 | 79 | | |
74 | 80 | | |
| |||
233 | 239 | | |
234 | 240 | | |
235 | 241 | | |
| 242 | + | |
| 243 | + | |
236 | 244 | | |
237 | | - | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
238 | 248 | | |
239 | 249 | | |
240 | 250 | | |
| |||
0 commit comments