-
Notifications
You must be signed in to change notification settings - Fork 288
Add Intel AutoRound algorithm support #1994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 45 commits
Commits
Show all changes
67 commits
Select commit
Hold shift + click to select a range
80c92da
add auto-round
yiliu30 75f7efd
Merge branch 'main' into up-ar
yiliu30 3266b79
add auto-round modifier
yiliu30 9c537cc
refine code
yiliu30 bebe0fa
disbale qac for auto-round
yiliu30 dfb0ff8
clean code
yiliu30 513972c
add compile after disable qac
yiliu30 2291cc4
add iters and clean code
yiliu30 4028853
clean code
yiliu30 97ff9e0
add example
yiliu30 cb7a5b4
refine docs
yiliu30 5a7500e
refine example
yiliu30 d02a355
add init
yiliu30 cea9d2f
clean code
yiliu30 22be9b7
format
yiliu30 6cdb402
refactor
yiliu30 e2814eb
add ut
yiliu30 3e4a9fc
test llama 3
yiliu30 aa34b65
clean code
yiliu30 afe2ff7
parse layer-wise config
yiliu30 8e9eccc
format
yiliu30 81f76af
add docstring
yiliu30 afa6150
add ar
yiliu30 97217e7
update example
yiliu30 3dcb434
align api
yiliu30 aef7707
format
yiliu30 97e1ca2
clean code
yiliu30 c75c272
fix typo
yiliu30 3d8a0c8
small iters for ut
yiliu30 6729a75
format
yiliu30 bb4dbe8
refine comment
yiliu30 2adf0e7
replace papaer link
yiliu30 dd9bde9
correct comments
yiliu30 4980229
Merge branch 'main' into autoround-support
yiliu30 7d97255
update comments
yiliu30 f298e82
refine code
yiliu30 73c3571
add more checks
yiliu30 eb16397
update example
yiliu30 9cb1f06
move auto-round to modifier
yiliu30 76e0d21
apply untie
yiliu30 1cbe919
correct docstring
yiliu30 9fa5efb
enable ci
yiliu30 7937d80
revert import AutoRoundModifier into modfifier directly
yiliu30 e58b2bd
update
yiliu30 bd70ea6
Merge branch 'main' into autoround-support
yiliu30 6b236f6
merge main
yiliu30 4c94187
clean
yiliu30 7ea8442
fix
yiliu30 f52c0c0
refactor
yiliu30 4a9c4aa
format
yiliu30 0567df6
Update src/llmcompressor/modifiers/autoround/base.py
yiliu30 650a19c
refine docs
yiliu30 58e09bf
Merge branch 'autoround-support' of https://github.com/yiliu30/llm-co…
yiliu30 5cd35a6
fix import
yiliu30 678b123
Update src/llmcompressor/modifiers/autoround/base.py
yiliu30 a8c63d3
add qinput
yiliu30 38634dc
Merge branch 'autoround-support' of https://github.com/yiliu30/llm-co…
yiliu30 fbc047a
clean cache
yiliu30 96b6490
align api
yiliu30 d00d41b
fix
yiliu30 d4a8fb0
fix
yiliu30 487fcd2
update
yiliu30 baeea3f
Merge branch 'main' into autoround-support
yiliu30 3adc879
add requires_gpu for ut
yiliu30 ac10f7b
Merge branch 'main' into autoround-support
yiliu30 decb14f
Merge branch 'autoround-support' of https://github.com/yiliu30/llm-co…
yiliu30 f9dabc4
Merge branch 'main' into autoround-support
yiliu30 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,56 @@ | ||
| from auto_round.calib_dataset import get_dataset | ||
| from transformers import AutoModelForCausalLM, AutoTokenizer | ||
|
|
||
| from llmcompressor import oneshot | ||
| from llmcompressor.modifiers.autoround import AutoRoundModifier | ||
| from llmcompressor.utils import dispatch_for_generation | ||
|
|
||
| # Select model and load it. | ||
| model_id = "meta-llama/Meta-Llama-3-8B-Instruct" | ||
| model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto") | ||
| tokenizer = AutoTokenizer.from_pretrained(model_id) | ||
|
|
||
| # Select calibration dataset. | ||
| NUM_CALIBRATION_SAMPLES = 128 | ||
| MAX_SEQUENCE_LENGTH = 2048 | ||
| # Get aligned calibration dataset. | ||
|
|
||
| ds = get_dataset( | ||
| tokenizer=tokenizer, | ||
| seqlen=MAX_SEQUENCE_LENGTH, | ||
| nsamples=NUM_CALIBRATION_SAMPLES, | ||
| ) | ||
|
|
||
|
|
||
| # Configure the quantization algorithm to run. | ||
| # * quantize the weights to 4 bit with AutoRound with a group size 128 | ||
| recipe = AutoRoundModifier( | ||
| targets="Linear", scheme="W4A16", ignore=["lm_head"], iters=200 | ||
| ) | ||
|
|
||
|
|
||
| # Apply algorithms. | ||
| oneshot( | ||
| model=model, | ||
| dataset=ds, | ||
| recipe=recipe, | ||
| max_seq_length=MAX_SEQUENCE_LENGTH, | ||
| num_calibration_samples=NUM_CALIBRATION_SAMPLES, | ||
| # disable shuffling to get slightly better mmlu score | ||
| shuffle_calibration_samples=False, | ||
| ) | ||
|
|
||
| # Confirm generations of the quantized model look sane. | ||
| print("\n\n") | ||
| print("========== SAMPLE GENERATION ==============") | ||
| dispatch_for_generation(model) | ||
| sample = tokenizer("Hello my name is", return_tensors="pt") | ||
| sample = {key: value.to(model.device) for key, value in sample.items()} | ||
| output = model.generate(**sample, max_new_tokens=100) | ||
| print(tokenizer.decode(output[0])) | ||
| print("==========================================\n\n") | ||
|
|
||
| # Save to disk compressed. | ||
| SAVE_DIR = model_id.rstrip("/").split("/")[-1] + "-W4A16-G128-AutoRound" | ||
| model.save_pretrained(SAVE_DIR, save_compressed=True) | ||
| tokenizer.save_pretrained(SAVE_DIR) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| # ruff: noqa | ||
|
|
||
yiliu30 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| from .base import * | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.