Skip to content

Conversation

matthewdouglas
Copy link
Member

What does this PR do?

This PR adds a new option to BitsAndBytesConfig called bnb_4bit_target_parameters with the same spirit as target_parameters in huggingface/peft#2638. The intent is to allow quantization of nn.Parameter that are not within a nn.Linear, e.g. those found commonly in certain MoE model implementations.

Requires bitsandbytes-foundation/bitsandbytes#1720 which is being concurrently developed.

Example usage with a Granite MoE:

model = GraniteMoeForCausalLM.from_pretrained(
    "ibm-granite/granite-3.1-3b-a800m-base",
    torch_dtype=torch.bfloat16,
    device_map="cuda:0",
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16,
        bnb_4bit_use_double_quant=False,
        bnb_4bit_target_parameters=["block_sparse_moe.input_linear.weight", "block_sparse_moe.output_linear.weight"],
        llm_int8_skip_modules=["lm_head", "block_sparse_moe.router"]
    ),
)

Memory Usage - BF16

Metric Cur Usage Peak Usage Tot Alloc Tot Freed
Allocated memory 6291 MiB 6292 MiB 12583 MiB 6292 MiB
Active memory 6291 MiB 6292 MiB 12583 MiB 6292 MiB
Requested memory 6291 MiB 6291 MiB 12583 MiB 6291 MiB

Memory Usage - Before PR

Metric Cur Usage Peak Usage Tot Alloc Tot Freed
Allocated memory 6019 MiB 6027 MiB 9935 MiB 3916 MiB
Active memory 6019 MiB 6027 MiB 9935 MiB 3916 MiB
Requested memory 6015 MiB 6024 MiB 9929 MiB 3913 MiB

Memory Usage - After PR

Metric Cur Usage Peak Usage Tot Alloc Tot Freed
Allocated memory 1894 MiB 2054 MiB 9424 MiB 7530 MiB
Active memory 1894 MiB 2054 MiB 9424 MiB 7530 MiB
Requested memory 1875 MiB 2035 MiB 9389 MiB 7513 MiB

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [ x ] Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
    (See Slack discussion)
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@SunMarc @MekkCyber @BenjaminBossan

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Rocketknight1
Copy link
Member

cc @MekkCyber

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice it would be great to add some tests (inference / saving) with gptoss model !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants