-
Notifications
You must be signed in to change notification settings - Fork 1
[docs] Add autoround doc #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
|
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed. |
Signed-off-by: yiliu30 <[email protected]>
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds documentation for the AutoRound quantization method. The changes include adding an entry for AutoRound to the compression methods table and creating a new README file with a detailed walkthrough for the autoround example. My review focuses on improving the clarity, correctness, and consistency of the new documentation. I've identified a few typos, a grammatical error, and some inconsistencies between the documentation and the example code, and I've provided suggestions to address these points.
Signed-off-by: yiliu30 <[email protected]>
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces documentation for the new AutoRound quantization feature. It adds an entry for AutoRound in the main compression guide and includes a comprehensive README.md for the autoround example. The new documentation is well-structured and informative. I've identified a couple of minor issues in the new README.md—an inconsistency with the example code and a potential typo—that could be addressed to improve clarity for users.
Signed-off-by: yiliu30 <[email protected]>
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds documentation for the AutoRound quantization method. A new entry is added to the compression methods table, and a new README.md file in examples/autoround provides a comprehensive guide. The documentation is well-written, but I have a few suggestions to improve clarity and correct a command that would otherwise fail.
| The example includes an end-to-end script for applying the AutoRound quantization algorithm. | ||
|
|
||
| ```bash | ||
| python3 llama3_example.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Quickstart command is a bit ambiguous. After following the Installation steps, the user will be in the root of the repository. To run the example script, they need to provide the path to it. This change makes the command explicit and runnable from the repository root, which is more user-friendly.
| python3 llama3_example.py | |
| python3 examples/autoround/llama3_example.py |
|
|
||
| ### 1) Load Model | ||
|
|
||
| Load the model using `AutoModelForCausalLM` for handling quantized saving and loading. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence could be slightly misleading. AutoModelForCausalLM is a standard part of the transformers library. The functionality for handling quantized weights is added by llm-compressor by patching the model object. A clearer phrasing would avoid potential confusion for users.
| Load the model using `AutoModelForCausalLM` for handling quantized saving and loading. | |
| Load the model using `AutoModelForCausalLM` from the `transformers` library. |
docs/getting-started/compress.md
Outdated
| | **AWQ** | Uses channelwise scaling to better preserve important outliers in weights and activations | Better accuracy recovery with faster runtime than GPTQ | | ||
| | **SmoothQuant** | Smooths outliers in activations by folding them into weights, ensuring better accuracy for weight and activation quantized models | Good accuracy recovery with minimal calibration time; composable with other methods | | ||
| | **Round-To-Nearest (RTN)** | Simple quantization technique that rounds each value to the nearest representable level in the target precision. | Provides moderate accuracy recovery in most scenarios. Computationally cheap and fast to implement, making it suitable for real-time or resource-constrained environments. | | ||
| | **AutoRound** | Introduces lightweight trainable parameters to optimize rounding and clipping ranges using block-wise reconstruction error. | Strong accuracy recovery with moderate tuning time; significantly more accurate than RTN and generally faster than GPTQ. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AutoRound optimizes rounding and clipping ranges via sign-gradient descent. Delivers leading 4-bit and superior sub-4-bit accuracy compared to GPTQ/AWQ, with runtime faster than GPTQ and on par with AWQ.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yiliu30 <[email protected]>
The rendered version:
https://github.com/yiliu30/llm-compressor-fork/tree/ar-doc/examples/autoround
https://github.com/yiliu30/llm-compressor-fork/blob/ar-doc/docs/getting-started/compress.md