Add Intel AutoRound algorithm support #1994

yiliu30 · 2025-11-05T07:38:47Z

Resolve #1968

Highlights

Introduced AutoRoundModifier to enable AutoRound quantization for wNa16.
Added an end-to-end example and unit tests.
Verified functionality with local accuracy tests (GSM8K with a limit of 1000, the results may fluctuate due to non-determinism.)

- LLMC-AutoRound
vllm (pretrained=/storage/yiliu7/Meta-Llama-3-8B-Instruct-W4A16-G128-disbale-shuffule,tensor_parallel_size=1,max_model_len=8192,max_num_batched_tokens=32768,max_num_seqs=128,add_bos_token=True,gpu_memory_utilization=0.8,dtype=bfloat16,max_gen_toks=2048,enable_prefix_caching=False), gen_kwargs: (None), limit: 1000.0, num_fewshot: None, batch_size: 128
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.737|±  |0.0139|
|     |       |strict-match    |     5|exact_match|↑  |0.736|±  |0.0139|

- AutoRound result as ref
vllm (pretrained=/storage/yiliu7/meta-llama/Meta-Llama-3-8B-Instruct-ar/Meta-Llama-3-8B-Instruct-w4g128/,tensor_parallel_size=1,max_model_len=8192,max_num_batched_tokens=32768,max_num_seqs=128,add_bos_token=True,gpu_memory_utilization=0.8,dtype=bfloat16,max_gen_toks=2048,enable_prefix_caching=False), gen_kwargs: (None), limit: 1000.0, num_fewshot: None, batch_size: 128
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.739|±  |0.0139|
|     |       |strict-match    |     5|exact_match|↑  |0.740|±  |0.0139|

Attached eval cmd FYI.

Next stage (in later PRs)

Extend support for additional data types.
Add group-wise quantization recipes mapping between LLMC and AutoRound.
Add end-to-end tests.

cc @hshen14 @thuang6 @wenhuach21

Signed-off-by: yiliu30 <[email protected]>

kylesayrs

Looks great. There's a few more things I'd like to point out, will give a full review soon.

kylesayrs · 2025-11-05T17:34:49Z

src/llmcompressor/modifiers/autoround/base.py

+        if cur_layer_idx >= len(state.model.model.layers):
+            # skip the lm_head layer
+            return
+        decoding_layer = state.model.model.layers[cur_layer_idx]


The sequential pipeline is not guaranteed to break a model into decoder layers. If a user specifies sequential_targets="Linear", then each SEQUENTIAL_EPOCH_END will trigger on each linear layer of the model.

One way to generalize this would be to have the sequential pipeline return the modules that were in the sequential layer

class Subgraph: def get_modules(self, model: torch.nn.Module, recurse: bool = False) -> Set[torch.nn.Module]: nodes = self.graph.find_nodes(op="call_module") modules = set(model.get_submodule(node.target) for node in nodes) if recurse: modules = set(module.modules() for module in modules) return modules

class SequentialPipeline: ... for subgraph in subgraphs: LifecycleCallbacks.sequential_epoch_end(subgraph)

def apply_autoround(self, state, subgraph): decoding_layer = torch.nn.ModuleList(list(subgraph.modules()))

FYI these changes are implemented here, and this PR can potentially rebase on them.

Thanks for the detailed explanation — this approach is definitely more robust. I’ve tested it locally, and it works well.

Will #1998 be merged soon? If so, I’d prefer to rebase on main and update my PR accordingly to avoid introducing too much code here.

I have rebased the main branch and updated it accordingly.

src/llmcompressor/modifiers/autoround/base.py

src/llmcompressor/modifiers/quantization/autoround/base.py

Signed-off-by: yiliu30 <[email protected]>

…ve `layer_sequential` pipeline (#1998) ## Purpose ## * Enable better targeting of modules by modifiers such as [AutoRound](#1994) * Remove legacy pipeline (which is incompatible with this change) ## Changes ## * Pass subgraph to `sequential_epoch_end`, allowing modifiers to view all of the module that were called in the subgraph * Implement `submodules` method on `Subgraph` which returns all the modules called by this subgraph * Remove `LayerSequentialPipeline`, which does not use the `Subgraph` API and has been superseded by the sequential pipeline --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Dipika Sikka <[email protected]>

brian-dellabetta

Thanks for the contribution! This is really cool. I have a few comments/questions in an initial review

src/llmcompressor/modifiers/autoround/base.py

Signed-off-by: yiliu30 <[email protected]>

Co-authored-by: Brian Dellabetta <[email protected]> Signed-off-by: Yi Liu <[email protected]>

Signed-off-by: yiliu30 <[email protected]>

…mpressor-fork into autoround-support

brian-dellabetta

Thanks for addressing my comments! A few more small things:

src/llmcompressor/modifiers/autoround/base.py

brian-dellabetta · 2025-11-07T16:56:37Z

setup.py

            if BUILD_TYPE == "release"
            else "compressed-tensors>=0.12.3a2"
        ),
+        # TODO: replace it with the release version


Hi @yiliu30 , do you have an estimate for when the next version of autoround will release? Does it have the appropriate licensing to avoid issues like this?

src/llmcompressor/modifiers/autoround/base.py

Signed-off-by: yiliu30 <[email protected]>

Co-authored-by: Brian Dellabetta <[email protected]> Signed-off-by: Yi Liu <[email protected]>

yiliu30 · 2025-11-08T04:15:43Z

Hi @yiliu30 , do you have an estimate for when the next version of autoround will release? Does it have the appropriate licensing to avoid issues like vllm-project/compressed-tensors#468?

Hi @brian-dellabetta , We're planning to release the next version within the next 1–2 weeks—hope that works for you!
As for AutoRound, it's licensed under Apache License 2.0, so I guess there shouldn't be any licensing concerns.

yiliu30 added 30 commits November 2, 2025 20:49

add auto-round

80c92da

Signed-off-by: yiliu30 <[email protected]>

Merge branch 'main' into up-ar

75f7efd

add auto-round modifier

3266b79

Signed-off-by: yiliu30 <[email protected]>

refine code

9c537cc

Signed-off-by: yiliu30 <[email protected]>

disbale qac for auto-round

bebe0fa

Signed-off-by: yiliu30 <[email protected]>

clean code

dfb0ff8

Signed-off-by: yiliu30 <[email protected]>

add compile after disable qac

513972c

Signed-off-by: yiliu30 <[email protected]>

add iters and clean code

2291cc4

Signed-off-by: yiliu30 <[email protected]>

clean code

4028853

Signed-off-by: yiliu30 <[email protected]>

add example

97ff9e0

Signed-off-by: yiliu30 <[email protected]>

refine docs

cb7a5b4

Signed-off-by: yiliu30 <[email protected]>

refine example

5a7500e

Signed-off-by: yiliu30 <[email protected]>

add init

d02a355

Signed-off-by: yiliu30 <[email protected]>

clean code

cea9d2f

Signed-off-by: yiliu30 <[email protected]>

format

22be9b7

Signed-off-by: yiliu30 <[email protected]>

refactor

6cdb402

Signed-off-by: yiliu30 <[email protected]>

add ut

e2814eb

Signed-off-by: yiliu30 <[email protected]>

test llama 3

3e4a9fc

Signed-off-by: yiliu30 <[email protected]>

clean code

aa34b65

Signed-off-by: yiliu30 <[email protected]>

parse layer-wise config

afe2ff7

Signed-off-by: yiliu30 <[email protected]>

format

8e9eccc

Signed-off-by: yiliu30 <[email protected]>

add docstring

81f76af

Signed-off-by: yiliu30 <[email protected]>

add ar

afa6150

Signed-off-by: yiliu30 <[email protected]>

update example

97217e7

Signed-off-by: yiliu30 <[email protected]>

align api

3dcb434

Signed-off-by: yiliu30 <[email protected]>

format

aef7707

Signed-off-by: yiliu30 <[email protected]>

clean code

97e1ca2

Signed-off-by: yiliu30 <[email protected]>

fix typo

c75c272

Signed-off-by: yiliu30 <[email protected]>

small iters for ut

3d8a0c8

Signed-off-by: yiliu30 <[email protected]>

format

6729a75

Signed-off-by: yiliu30 <[email protected]>

kylesayrs reviewed Nov 5, 2025

View reviewed changes

HDCharles reviewed Nov 5, 2025

View reviewed changes

src/llmcompressor/modifiers/autoround/base.py Show resolved Hide resolved

HDCharles reviewed Nov 5, 2025

View reviewed changes

src/llmcompressor/modifiers/quantization/autoround/base.py Outdated Show resolved Hide resolved

kylesayrs mentioned this pull request Nov 5, 2025

[Sequential Pipeline] Return subgraph on sequential_epoch_end, remove layer_sequential pipeline #1998

Merged

yiliu30 added 8 commits November 5, 2025 23:23

update example

eb16397

Signed-off-by: yiliu30 <[email protected]>

move auto-round to modifier

9cb1f06

Signed-off-by: yiliu30 <[email protected]>

apply untie

76e0d21

Signed-off-by: yiliu30 <[email protected]>

correct docstring

1cbe919

Signed-off-by: yiliu30 <[email protected]>

enable ci

9fa5efb

Signed-off-by: yiliu30 <[email protected]>

revert import AutoRoundModifier into modfifier directly

7937d80

Signed-off-by: yiliu30 <[email protected]>

update

e58b2bd

Signed-off-by: yiliu30 <[email protected]>

Merge branch 'main' into autoround-support

bd70ea6

brian-dellabetta reviewed Nov 6, 2025

View reviewed changes

src/llmcompressor/modifiers/autoround/base.py Outdated Show resolved Hide resolved

src/llmcompressor/modifiers/autoround/base.py Outdated Show resolved Hide resolved

src/llmcompressor/modifiers/autoround/base.py Outdated Show resolved Hide resolved

yiliu30 and others added 8 commits November 6, 2025 19:36

merge main

6b236f6

Signed-off-by: yiliu30 <[email protected]>

clean

4c94187

Signed-off-by: yiliu30 <[email protected]>

fix

7ea8442

Signed-off-by: yiliu30 <[email protected]>

refactor

f52c0c0

Signed-off-by: yiliu30 <[email protected]>

format

4a9c4aa

Signed-off-by: yiliu30 <[email protected]>

Update src/llmcompressor/modifiers/autoround/base.py

0567df6

Co-authored-by: Brian Dellabetta <[email protected]> Signed-off-by: Yi Liu <[email protected]>

refine docs

650a19c

Signed-off-by: yiliu30 <[email protected]>

Merge branch 'autoround-support' of https://github.com/yiliu30/llm-co…

58e09bf

…mpressor-fork into autoround-support

yiliu30 requested review from HDCharles, brian-dellabetta, dsikka and kylesayrs November 7, 2025 11:05

brian-dellabetta reviewed Nov 7, 2025

View reviewed changes

yiliu30 and others added 2 commits November 7, 2025 19:55

fix import

5cd35a6

Signed-off-by: yiliu30 <[email protected]>

Update src/llmcompressor/modifiers/autoround/base.py

678b123

Co-authored-by: Brian Dellabetta <[email protected]> Signed-off-by: Yi Liu <[email protected]>

Add Intel AutoRound algorithm support #1994

Are you sure you want to change the base?

Add Intel AutoRound algorithm support #1994

Conversation

yiliu30 commented Nov 5, 2025

Highlights

Next stage (in later PRs)

Uh oh!

kylesayrs left a comment

Choose a reason for hiding this comment

Uh oh!

kylesayrs Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kylesayrs Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

yiliu30 Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

yiliu30 Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

brian-dellabetta Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yiliu30 commented Nov 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kylesayrs Nov 5, 2025 •

edited

Loading