Skip to content

Conversation

@brian-dellabetta
Copy link
Collaborator

SUMMARY:
We want to provide users the ability to disable quantization in AWQModifier, so the best scales are found and applied to the weights without round-to-nearest quantization. This is useful when someone wants to run AWQ and GPTQ, before ultimately quantizing weights. This is a draft solution, see #1972 for discussion

TEST PLAN:
"please outline how the changes were tested"

@github-actions
Copy link

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

@zhanglei1172
Copy link
Contributor

Currently, if only want the scale of awq to take effect and do not want to perform the quantization process, one can simply set save compressed= False in the model.save pretrained method (maybe the scale+zero point parameter needs to be manually deleted).

@brian-dellabetta
Copy link
Collaborator Author

Currently, if only want the scale of awq to take effect and do not want to perform the quantization process, one can simply set save compressed= False in the model.save pretrained method (maybe the scale+zero point parameter needs to be manually deleted).

Hi @zhanglei1172 , yes that's probably true. But it would require some manual overhead for a user who wants to do that, whereas this could just be a single boolean flag disable_quantization, based on discussion #1972. I think it would amount to the same thing, scales and zero points are just not created and quantization configs and statuses are pruned from the target modules, so the checkpoint at the end of the pipeline would just have modified weights

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants