Skip to content

Conversation

@yiliu30
Copy link
Owner

@yiliu30 yiliu30 commented Nov 18, 2025

Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
@github-actions
Copy link

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

Signed-off-by: yiliu30 <[email protected]>
@yiliu30
Copy link
Owner Author

yiliu30 commented Nov 18, 2025

/gemini review

@yiliu30 yiliu30 marked this pull request as ready for review November 18, 2025 07:12
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds documentation for the AutoRound quantization method. The changes include adding an entry for AutoRound to the compression methods table and creating a new README file with a detailed walkthrough for the autoround example. My review focuses on improving the clarity, correctness, and consistency of the new documentation. I've identified a few typos, a grammatical error, and some inconsistencies between the documentation and the example code, and I've provided suggestions to address these points.

Signed-off-by: yiliu30 <[email protected]>
@yiliu30
Copy link
Owner Author

yiliu30 commented Nov 18, 2025

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces documentation for the new AutoRound quantization feature. It adds an entry for AutoRound in the main compression guide and includes a comprehensive README.md for the autoround example. The new documentation is well-structured and informative. I've identified a couple of minor issues in the new README.md—an inconsistency with the example code and a potential typo—that could be addressed to improve clarity for users.

Signed-off-by: yiliu30 <[email protected]>
@yiliu30 yiliu30 changed the title Ar doc [doc] Add AutoRound Doc Nov 20, 2025
@yiliu30 yiliu30 changed the title [doc] Add AutoRound Doc [docs] Add autoround doc Nov 20, 2025
@yiliu30
Copy link
Owner Author

yiliu30 commented Nov 20, 2025

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds documentation for the AutoRound quantization method. A new entry is added to the compression methods table, and a new README.md file in examples/autoround provides a comprehensive guide. The documentation is well-written, but I have a few suggestions to improve clarity and correct a command that would otherwise fail.

The example includes an end-to-end script for applying the AutoRound quantization algorithm.

```bash
python3 llama3_example.py

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Quickstart command is a bit ambiguous. After following the Installation steps, the user will be in the root of the repository. To run the example script, they need to provide the path to it. This change makes the command explicit and runnable from the repository root, which is more user-friendly.

Suggested change
python3 llama3_example.py
python3 examples/autoround/llama3_example.py


### 1) Load Model

Load the model using `AutoModelForCausalLM` for handling quantized saving and loading.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This sentence could be slightly misleading. AutoModelForCausalLM is a standard part of the transformers library. The functionality for handling quantized weights is added by llm-compressor by patching the model object. A clearer phrasing would avoid potential confusion for users.

Suggested change
Load the model using `AutoModelForCausalLM` for handling quantized saving and loading.
Load the model using `AutoModelForCausalLM` from the `transformers` library.

| **AWQ** | Uses channelwise scaling to better preserve important outliers in weights and activations | Better accuracy recovery with faster runtime than GPTQ |
| **SmoothQuant** | Smooths outliers in activations by folding them into weights, ensuring better accuracy for weight and activation quantized models | Good accuracy recovery with minimal calibration time; composable with other methods |
| **Round-To-Nearest (RTN)** | Simple quantization technique that rounds each value to the nearest representable level in the target precision. | Provides moderate accuracy recovery in most scenarios. Computationally cheap and fast to implement, making it suitable for real-time or resource-constrained environments. |
| **AutoRound** | Introduces lightweight trainable parameters to optimize rounding and clipping ranges using block-wise reconstruction error. | Strong accuracy recovery with moderate tuning time; significantly more accurate than RTN and generally faster than GPTQ. |
Copy link

@wenhuach21 wenhuach21 Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoRound optimizes rounding and clipping ranges via sign-gradient descent. Delivers leading 4-bit and superior sub-4-bit accuracy compared to GPTQ/AWQ, with runtime faster than GPTQ and on par with AWQ.

yiliu30 and others added 2 commits November 20, 2025 10:46
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yiliu30 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants