[docs] Add autoround doc #10

yiliu30 · 2025-11-18T07:03:23Z

The rendered version:
https://github.com/yiliu30/llm-compressor-fork/tree/ar-doc/examples/autoround
https://github.com/yiliu30/llm-compressor-fork/blob/ar-doc/docs/getting-started/compress.md

Signed-off-by: yiliu30 <[email protected]>

github-actions · 2025-11-18T07:03:30Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

Signed-off-by: yiliu30 <[email protected]>

yiliu30 · 2025-11-18T07:11:01Z

/gemini review

gemini-code-assist

Code Review

This pull request adds documentation for the AutoRound quantization method. The changes include adding an entry for AutoRound to the compression methods table and creating a new README file with a detailed walkthrough for the autoround example. My review focuses on improving the clarity, correctness, and consistency of the new documentation. I've identified a few typos, a grammatical error, and some inconsistencies between the documentation and the example code, and I've provided suggestions to address these points.

examples/autoround/README.md

docs/getting-started/compress.md

examples/autoround/README.md

Signed-off-by: yiliu30 <[email protected]>

yiliu30 · 2025-11-18T12:13:59Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces documentation for the new AutoRound quantization feature. It adds an entry for AutoRound in the main compression guide and includes a comprehensive README.md for the autoround example. The new documentation is well-structured and informative. I've identified a couple of minor issues in the new README.md—an inconsistency with the example code and a potential typo—that could be addressed to improve clarity for users.

examples/autoround/README.md

Signed-off-by: yiliu30 <[email protected]>

yiliu30 · 2025-11-20T01:17:48Z

/gemini review

gemini-code-assist

Code Review

This pull request adds documentation for the AutoRound quantization method. A new entry is added to the compression methods table, and a new README.md file in examples/autoround provides a comprehensive guide. The documentation is well-written, but I have a few suggestions to improve clarity and correct a command that would otherwise fail.

examples/autoround/README.md

gemini-code-assist · 2025-11-20T01:19:11Z

examples/autoround/README.md

+The example includes an end-to-end script for applying the AutoRound quantization algorithm.
+
+```bash
+python3 llama3_example.py


The Quickstart command is a bit ambiguous. After following the Installation steps, the user will be in the root of the repository. To run the example script, they need to provide the path to it. This change makes the command explicit and runnable from the repository root, which is more user-friendly.

Suggested change

python3 llama3_example.py

python3 examples/autoround/llama3_example.py

gemini-code-assist · 2025-11-20T01:19:11Z

examples/autoround/README.md

+
+### 1) Load Model
+
+Load the model using `AutoModelForCausalLM` for handling quantized saving and loading. 


This sentence could be slightly misleading. AutoModelForCausalLM is a standard part of the transformers library. The functionality for handling quantized weights is added by llm-compressor by patching the model object. A clearer phrasing would avoid potential confusion for users.

Suggested change

Load the model using `AutoModelForCausalLM` for handling quantized saving and loading.

Load the model using `AutoModelForCausalLM` from the `transformers` library.

wenhuach21 · 2025-11-20T02:02:58Z

docs/getting-started/compress.md

 | **AWQ** | Uses channelwise scaling to better preserve important outliers in weights and activations | Better accuracy recovery with faster runtime than GPTQ |
 | **SmoothQuant** | Smooths outliers in activations by folding them into weights, ensuring better accuracy for weight and activation quantized models | Good accuracy recovery with minimal calibration time; composable with other methods |
 | **Round-To-Nearest (RTN)** | Simple quantization technique that rounds each value to the nearest representable level in the target precision. | Provides moderate accuracy recovery in most scenarios. Computationally cheap and fast to implement, making it suitable for real-time or resource-constrained environments. |
+| **AutoRound** | Introduces lightweight trainable parameters to optimize rounding and clipping ranges using block-wise reconstruction error. | Strong accuracy recovery with moderate tuning time; significantly more accurate than RTN and generally faster than GPTQ. |


AutoRound optimizes rounding and clipping ranges via sign-gradient descent. Delivers leading 4-bit and superior sub-4-bit accuracy compared to GPTQ/AWQ, with runtime faster than GPTQ and on par with AWQ.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Signed-off-by: yiliu30 <[email protected]>

yiliu30 added 8 commits November 12, 2025 17:01

add init doc

baad120

Signed-off-by: yiliu30 <[email protected]>

update docs

e73b98d

Signed-off-by: yiliu30 <[email protected]>

update docs

03b420e

Signed-off-by: yiliu30 <[email protected]>

update

37a03ec

Signed-off-by: yiliu30 <[email protected]>

update

f993f80

Signed-off-by: yiliu30 <[email protected]>

update

374d991

Signed-off-by: yiliu30 <[email protected]>

fix

73fdcf1

Signed-off-by: yiliu30 <[email protected]>

update

ead09b3

Signed-off-by: yiliu30 <[email protected]>

add more

9eedf88

Signed-off-by: yiliu30 <[email protected]>

yiliu30 marked this pull request as ready for review November 18, 2025 07:12

gemini-code-assist bot reviewed Nov 18, 2025

View reviewed changes

fix

6e9623b

Signed-off-by: yiliu30 <[email protected]>

gemini-code-assist bot reviewed Nov 18, 2025

View reviewed changes

examples/autoround/README.md Show resolved Hide resolved

examples/autoround/README.md Show resolved Hide resolved

update

87d7162

Signed-off-by: yiliu30 <[email protected]>

yiliu30 changed the title ~~Ar doc~~ [doc] Add AutoRound Doc Nov 20, 2025

yiliu30 changed the title ~~[doc] Add AutoRound Doc~~ [docs] Add autoround doc Nov 20, 2025

gemini-code-assist bot reviewed Nov 20, 2025

View reviewed changes

wenhuach21 reviewed Nov 20, 2025

View reviewed changes

yiliu30 and others added 2 commits November 20, 2025 10:46

Update examples/autoround/README.md

1a76018

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

refine

5e7df30

Signed-off-by: yiliu30 <[email protected]>

	python3 llama3_example.py
	python3 examples/autoround/llama3_example.py


		### 1) Load Model

		Load the model using `AutoModelForCausalLM` for handling quantized saving and loading.

[docs] Add autoround doc #10

Are you sure you want to change the base?

[docs] Add autoround doc #10

Uh oh!

Conversation

yiliu30 commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 18, 2025

Uh oh!

yiliu30 commented Nov 18, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yiliu30 commented Nov 18, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

yiliu30 commented Nov 20, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

wenhuach21 Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yiliu30 commented Nov 18, 2025 •

edited

Loading

wenhuach21 Nov 20, 2025 •

edited

Loading