Skip to content

[Bug]: MistralCommonTokenizer not supported #1795

@anmarques

Description

@anmarques

⚙️ Your current environment

The output of python collect_env.py
### Environment Information ###
Operating System: `Linux-5.15.0-113-generic-x86_64-with-glibc2.35`
Python Version: `3.10.12 (main, Jan 17 2025, 14:35:34) [GCC 11.4.0]`
llm-compressor Version: `0.7.1`
compressed-tensors Version: `0.11.0`
transformers Version: `4.55.2`
torch Version: `2.7.1`
CUDA Devices: `['NVIDIA H100 80GB HBM3', 'NVIDIA H100 80GB HBM3', 'NVIDIA H100 80GB HBM3', 'NVIDIA H100 80GB HBM3', 'NVIDIA H100 80GB HBM3', 'NVIDIA H100 80GB HBM3', 'NVIDIA H100 80GB HBM3', 'NVIDIA H100 80GB HBM3']`
AMD Devices: `None`

🐛 Describe the bug

llm-compressor fails to initialize the processor when using mistral models.

Traceback
Traceback (most recent call last):
  File "/home/alexandre/environments/fleurs/lib/python3.10/site-packages/llmcompressor/entrypoints/utils.py", line 233, in initialize_processor_from_path
    processor = AutoProcessor.from_pretrained(
  File "/home/alexandre/environments/fleurs/lib/python3.10/site-packages/transformers/models/auto/processing_auto.py", line 385, in from_pretrained
    return processor_class.from_pretrained(
  File "/home/alexandre/environments/fleurs/lib/python3.10/site-packages/transformers/processing_utils.py", line 1312, in from_pretrained
    args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
  File "/home/alexandre/environments/fleurs/lib/python3.10/site-packages/transformers/processing_utils.py", line 1371, in _get_arguments_from_pretrained
    args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
  File "/home/alexandre/environments/fleurs/lib/python3.10/site-packages/transformers/tokenization_mistral_common.py", line 1762, in from_pretrained
    raise ValueError(
ValueError: Kwargs ['trust_remote_code', 'use_fast', '_from_auto'] are not supported by `MistralCommonTokenizer.from_pretrained`.

🛠️ Steps to reproduce

import torch
from transformers import VoxtralForConditionalGeneration, AutoProcessor
from llmcompressor import oneshot
from llmcompressor.modifiers.quantization import QuantizationModifier

# Select model and load it.
MODEL_ID = "mistralai/Voxtral-Small-24B-2507"

model = VoxtralForConditionalGeneration.from_pretrained(MODEL_ID, torch_dtype=torch.bfloat16)
processor = AutoProcessor.from_pretrained(MODEL_ID)

# Recipe
recipe = QuantizationModifier(
    targets="Linear", 
    scheme="FP8_DYNAMIC", 
    ignore=["language_model.lm_head", "re:audio_tower.*" ,"re:multi_modal_projector.*"],
)

# Apply algorithms.
oneshot(
    model=model,
    recipe=recipe,
)

SAVE_DIR = MODEL_ID.rstrip("/").split("/")[-1] + "-FP8-dynamic"
model.save_pretrained(SAVE_DIR, save_compressed=True)
processor.save_pretrained(SAVE_DIR)

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions