Skip to content

Conversation

matthewdouglas
Copy link
Member

This PR is in the same spirit as the recently introduced feature in huggingface/peft#2638.

Several models exist in the Hugging Face ecosystem where there are MoE layers that use nn.Parameter and are not compatible with the default quantization approach of replacing nn.Linear. Such example models include, but are not limited to:

A new utility, bitsandbytes.nn.parametrize.replace_parameter_4bit() is introduced. This will quantize and replace an nn.Parameter with a parametrization layer which automatically dequantizes the parameter when it is accessed

Additional work will be done on the HF Transformers side to enable integration with options in BitsAndBytesConfig.
.

@matthewdouglas matthewdouglas added this to the v0.47.0 milestone Aug 1, 2025
@matthewdouglas matthewdouglas added the Enhancement New feature or request label Aug 1, 2025
Copy link

github-actions bot commented Aug 1, 2025

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@winglian
Copy link
Contributor

winglian commented Aug 6, 2025

Will there need to be any changes in PEFT to apply lora adapters to quantized parameters once this lands?

@BenjaminBossan
Copy link
Contributor

We'll have to test, but at the very least, huggingface/peft#2710 needs to be merged in PEFT for this to work properly.

@matthewdouglas matthewdouglas modified the milestones: v0.47.0, v0.48.0 Aug 14, 2025
@cmp-nct
Copy link

cmp-nct commented Aug 17, 2025

We'll have to test, but at the very least, huggingface/peft#2710 needs to be merged in PEFT for this to work properly.

it's merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants