-
Notifications
You must be signed in to change notification settings - Fork 224
Open
Labels
bugSomething isn't workingSomething isn't working
Description
⚙️ Your current environment
The output of python collect_env.py
### Environment Information ###
Operating System: `Linux-5.15.0-113-generic-x86_64-with-glibc2.35`
Python Version: `3.10.12 (main, Jan 17 2025, 14:35:34) [GCC 11.4.0]`
llm-compressor Version: `0.7.1`
compressed-tensors Version: `0.11.0`
transformers Version: `4.55.2`
torch Version: `2.7.1`
CUDA Devices: `['NVIDIA H100 80GB HBM3', 'NVIDIA H100 80GB HBM3', 'NVIDIA H100 80GB HBM3', 'NVIDIA H100 80GB HBM3', 'NVIDIA H100 80GB HBM3', 'NVIDIA H100 80GB HBM3', 'NVIDIA H100 80GB HBM3', 'NVIDIA H100 80GB HBM3']`
AMD Devices: `None`
🐛 Describe the bug
llm-compressor fails to initialize the processor when using mistral models.
Traceback
Traceback (most recent call last):
File "/home/alexandre/environments/fleurs/lib/python3.10/site-packages/llmcompressor/entrypoints/utils.py", line 233, in initialize_processor_from_path
processor = AutoProcessor.from_pretrained(
File "/home/alexandre/environments/fleurs/lib/python3.10/site-packages/transformers/models/auto/processing_auto.py", line 385, in from_pretrained
return processor_class.from_pretrained(
File "/home/alexandre/environments/fleurs/lib/python3.10/site-packages/transformers/processing_utils.py", line 1312, in from_pretrained
args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
File "/home/alexandre/environments/fleurs/lib/python3.10/site-packages/transformers/processing_utils.py", line 1371, in _get_arguments_from_pretrained
args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
File "/home/alexandre/environments/fleurs/lib/python3.10/site-packages/transformers/tokenization_mistral_common.py", line 1762, in from_pretrained
raise ValueError(
ValueError: Kwargs ['trust_remote_code', 'use_fast', '_from_auto'] are not supported by `MistralCommonTokenizer.from_pretrained`.
🛠️ Steps to reproduce
import torch
from transformers import VoxtralForConditionalGeneration, AutoProcessor
from llmcompressor import oneshot
from llmcompressor.modifiers.quantization import QuantizationModifier
# Select model and load it.
MODEL_ID = "mistralai/Voxtral-Small-24B-2507"
model = VoxtralForConditionalGeneration.from_pretrained(MODEL_ID, torch_dtype=torch.bfloat16)
processor = AutoProcessor.from_pretrained(MODEL_ID)
# Recipe
recipe = QuantizationModifier(
targets="Linear",
scheme="FP8_DYNAMIC",
ignore=["language_model.lm_head", "re:audio_tower.*" ,"re:multi_modal_projector.*"],
)
# Apply algorithms.
oneshot(
model=model,
recipe=recipe,
)
SAVE_DIR = MODEL_ID.rstrip("/").split("/")[-1] + "-FP8-dynamic"
model.save_pretrained(SAVE_DIR, save_compressed=True)
processor.save_pretrained(SAVE_DIR)
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working