Skip to content

[GPTQ] Change actorder default to "static" #1425

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

kylesayrs
Copy link
Collaborator

@kylesayrs kylesayrs commented May 12, 2025

Purpose

  • Use best defaults for GPTQ quantization

Prerequisites

Changes

  • Set gptq actorder default to "static"

Testing

  • Ran llama w4a16 example to completion and validated the correct activation ordering

@dsikka
Copy link
Collaborator

dsikka commented May 13, 2025

nice!

@markurtz
Copy link
Collaborator

@kylesayrs quick question from my side. Since the old default was None, do we risk defaulting to an incorrect value for older recipes that don't include it? Especially for ones that specified it in the quantization scheme?

@kylesayrs kylesayrs changed the base branch from kylesayrs/gptq-actorder to main May 15, 2025 04:40
@kylesayrs kylesayrs dismissed brian-dellabetta’s stale review May 15, 2025 04:40

The base branch was changed.

@kylesayrs kylesayrs changed the base branch from main to kylesayrs/gptq-actorder May 15, 2025 04:40
@kylesayrs
Copy link
Collaborator Author

@markurtz

  1. "older" recipes which did not specify will now use "static". This is (more or less) inevitable and imho acceptable, since a user which does not specify actorder probably does not care and just wants the best configuration
  2. For recipes which specify anything except "static", they will see this error encouraging them to modify their recipe to disable global actorder

rahul-tuli
rahul-tuli previously approved these changes May 16, 2025
Base automatically changed from kylesayrs/gptq-actorder to main May 19, 2025 16:39
kylesayrs added a commit that referenced this pull request May 19, 2025
## Purpose ##
* Make actorder option more intuitive for users
* Enable easier adjustment of actorder default #1425
* This change is conceptually intuitive because activation ordering is a
concept that only applies to the GPTQ algorithm (the only algorithm for
which quantization group order matters)

## Changes ##
* Add `actorder` argument to `GPTQModifier`
* Override `resolve_quantization_config` method to resolve config groups
with `actorder` argument
* (Misc) rearrange method order to match the typical order in which they
are called in the modifier lifecycle

## Testing ##
* Ran llama w4a16 example to completion

Signed-off-by: Kyle Sayers <[email protected]>
@kylesayrs kylesayrs dismissed stale reviews from rahul-tuli and brian-dellabetta May 19, 2025 16:39

The base branch was changed.

Signed-off-by: Kyle Sayers <[email protected]>
@kylesayrs kylesayrs force-pushed the kylesayrs/gptq-actorder-default branch from 8cf408e to 8b9f795 Compare May 20, 2025 20:58
@kylesayrs kylesayrs changed the base branch from main to kylesayrs/fix-default-actorder May 20, 2025 20:58
kylesayrs added 3 commits May 20, 2025 17:07
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Base automatically changed from kylesayrs/fix-default-actorder to main May 22, 2025 18:09
kylesayrs added a commit that referenced this pull request May 22, 2025
## Purpose ##
* Fix false assumption that `actorder` field is of enum type
* Despite the fact that actorder passes through a
[field_validator](https://github.com/neuralmagic/compressed-tensors/blob/main/src/compressed_tensors/quantization/quant_args.py#L200),
`QuantizationArgs` has the
[use_enum_values](https://github.com/neuralmagic/compressed-tensors/blob/main/src/compressed_tensors/quantization/quant_args.py#L128)
configuration set, meaning that enum values are converted to strings.
* This was done in relation to [this
fix](neuralmagic/sparseml#2327)
* Remove conflict with recipes which manually specify activation
ordering by using a sentinel value

## Follow ups ##
* #1425

## Testing ##
* Ran llama3 example with manually specified `actorder=group`

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
@kylesayrs kylesayrs changed the base branch from main to kylesayrs/actorder-test May 29, 2025 18:23
@kylesayrs kylesayrs requested a review from rahul-tuli May 29, 2025 18:29
rahul-tuli
rahul-tuli previously approved these changes May 30, 2025
@kylesayrs
Copy link
Collaborator Author

Waiting for next weekly to run before merging

Base automatically changed from kylesayrs/actorder-test to main May 30, 2025 04:57
@kylesayrs kylesayrs dismissed stale reviews from rahul-tuli and brian-dellabetta May 30, 2025 04:57

The base branch was changed.

Copy link
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this will incorrectly save models for our e2e testing:
https://github.com/vllm-project/llm-compressor/tree/main/tests/e2e/vLLM/configs (specifically, essentially remove the cases where there is no act order while duplicating some cases)

Same for lm-eval tests:
https://github.com/vllm-project/llm-compressor/tree/main/tests/lmeval/configs

@kylesayrs
Copy link
Collaborator Author

kylesayrs commented Jun 26, 2025

@dsikka

specifically, essentially remove the cases where there is no act order while duplicating some cases

Turning all non-specified actorder cases into actorder cases is the intention of this PR.

As for duplicating tests, I don't think any are actually duplicated? In order for a duplication to occur, it must conflict with an existing weight actorder config

w4a16_actorder_weight.yaml
w4a16_actorder_weight_qwen.yaml

vl_w4a16_actorder_weight.yaml
w4a16_actorder_weight.yaml

These are all w4a16. If you grep for non-specified w4a16 configs, you get

w4a16_channel_quant.yaml  # channelwise, not duplicating
w4a16_channel_quant_qwen.yaml  # channelwise, not duplicating
w4a16_grouped_quant.yaml  # different calibration and dataset, not duplicating
w4a16_grouped_quant_asym_awq.yaml  # AWQ, not duplicating
w4a16_2of4_channel_quant.yaml  # 2of4, not duplicating
w4a16_2of4_grouped_quant.yaml  # 2of4, not duplicating

All of the other w4a16 configs test for different things, except for maybe w4a16_grouped_quant.yaml which essentially tests for the same thing but has slightly different model and dataset

@dsikka
Copy link
Collaborator

dsikka commented Jun 26, 2025

w4a16_actorder_weight.yaml

  1. We lose all non-act order cases. We still need to test this in e2e and lm-eval
  2. We need to update the naming in the model produced/pushed to nm-testing so that we indicate act order being present
  3. We have two identical lm-eval cases?

tests/lmeval/configs/w4a16_actorder_weight.yaml and tests/lmeval/configs/w4a16_grouped_quant.yaml

@kylesayrs
Copy link
Collaborator Author

@kylesayrs
Copy link
Collaborator Author

We lose all non-act order cases. We still need to test this in e2e and lm-eval

Just posted a run above

We need to update the naming in the model produced/pushed to nm-testing so that we indicate act order being present

Since actorder is now the default, we do not need to include it in model names, since the convention is that default arguments are not included in the model name

We have two identical lm-eval cases?

As I mentioned previously, these two tests are slightly different. After the above nightly finishes, I'll delete tests/lmeval/configs/w4a16_actorder_weight.yaml

@dsikka
Copy link
Collaborator

dsikka commented Jun 27, 2025

We lose all non-act order cases. We still need to test this in e2e and lm-eval

Just posted a run above

We need to update the naming in the model produced/pushed to nm-testing so that we indicate act order being present

Since actorder is now the default, we do not need to include it in model names, since the convention is that default arguments are not included in the model name

We have two identical lm-eval cases?

As I mentioned previously, these two tests are slightly different. After the above nightly finishes, I'll delete tests/lmeval/configs/w4a16_actorder_weight.yaml

This is for our internal testing. Please keep the naming convention until we have adaptability of this case being our default.

And as I mentioned, we still need to test a case with no act order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants