Use stopping criteria from transformers (and other minor transformer fixes) #1723

gante · 2025-08-26T15:05:27Z

Related to #1703

This PR:

[bugfix] Uses the transformers stopping criteria to stop on arbitrary strings, as opposed to using a custom class (more details below)
[removes a warning] Updates torch_dtype -> dtype on transformers-related code. This was a recent deprecation.
[removes a warning, possibly fixes bugs] Passes the attention mask from the tokenizer to generate. Generating the attention mask on the fly from input_ids in generate is very brittle and should be avoided.

The custom stopping criteria defined in smolagents compares the generated text against the defined strings. If the generated text ends in the same strings, it stops generation.

This has two problems:

The comparison is done at a text level, so we have to decode the generated tokens at each step (= move tensors to CPU and then decode). In general, this is slow;
[source of bugs] to work, the model's generation at a given step has to end exactly on that piece of text. For instance, if we want to stop in foo bar, but the model generates foo bar or foo bar:, generation won't stop, and we probably want it to stop.

Both issues are addressed in the class present in transformers, so let's use it instead 🤗

albertvillanova

Thanks a lot for the awesome fixes!! 🤗
Just some minor comments.

albertvillanova · 2025-08-27T10:08:26Z

src/smolagents/models.py

-        torch_dtype (`str`, *optional*):
-            The torch_dtype to initialize your model with.
+        dtype (`str`, *optional*):
+            The dtype to initialize your model with.


Is this a breaking change you introduced in transformers?

yes -- we'll guarantee BC until v5.0.0 I believe

albertvillanova · 2025-08-27T10:09:05Z

src/smolagents/models.py

+
+        # BC: previously the type was set through `torch_dtype`. `dtype` is now prefered
+        torch_dtype = kwargs.pop("torch_dtype", None)
+        dtype = dtype or torch_dtype


OK, I see the explanation here.

Maybe we should emit a deprecation warning for smolagents users? What do you think?

Additionally, this makes some CI tests fail:

TypeError: LlamaForCausalLM.__init__() got an unexpected keyword argument 'dtype'

albertvillanova · 2025-08-27T10:13:01Z

src/smolagents/models.py

        completion_kwargs["max_new_tokens"] = max_new_tokens
        return dict(
-            inputs=prompt_tensor,
+            **prompt_tensor,


Not sure of understanding this change... 😅

The inputs were being prepared such that prompt_tensor only contained the input_ids. However, depending on the models and usage, the corresponding attention_mask (also returned by the tokenizer) may also be needed for a correct output. While using smolagents with transformers models, we could see a related warning being thrown :)

These changes make it so we pass all tokenizer encoding outputs (input_ids AND attention_mask) to model.generate, and thus guarantee correctness.

Thanks a lot for the clear explanation! 🤗

albertvillanova

Do you know why we get these errors for some models? https://github.com/huggingface/smolagents/actions/runs/17267725950/job/49003833323?pr=1723

LlamaForCausalLM.__init__() got an unexpected keyword argument 'dtype'
LlavaForConditionalGeneration.__init__() got an unexpected keyword argument 'dtype'

FAILED tests/test_agents.py::TestAgent::test_transformers_toolcalling_agent - TypeError: LlamaForCausalLM.__init__() got an unexpected keyword argument 'dtype'
FAILED tests/test_models.py::TestModel::test_transformers_message_no_tool - TypeError: LlamaForCausalLM.__init__() got an unexpected keyword argument 'dtype'
FAILED tests/test_models.py::TestModel::test_transformers_message_vl_no_tool - ValueError: Failed to load tokenizer and model for model_id='llava-hf/llava-interleave-qwen-0.5b-hf': LlavaForConditionalGeneration.__init__() got an unexpected keyword argument 'dtype'

gante · 2025-08-27T13:38:38Z

Not sure, I'll have to debug :D (can do it tomorrow)

albertvillanova · 2025-09-05T07:14:06Z

I am rerunning the tests after the latest transformers patch release: https://github.com/huggingface/transformers/releases/tag/v4.56.1

This patch most notably fixes an issue with the new dtype argument (replacing torch_dtype) in pipelines!

albertvillanova

After the transformers 4.56.1 patch release, the previous dtype errors disappeared.

I think the current CI error could be easily fixed: AssertionError: assert 'This is a photo' == 'This is a very'

However, we support transformers>=4.0.0, and therefore, those dtype errors could be an issue for some users.

I would suggest:

This PR handles only the stopping criteria
Leave the support for both dtype and torch_dtype for a subsequent PR

What do you think?

gante added 2 commits August 26, 2025 14:55

transformers fixes

d97528f

ruff

07c6007

albertvillanova requested changes Aug 27, 2025

View reviewed changes

Fix test_init

ab605e1

albertvillanova reviewed Aug 27, 2025

View reviewed changes

albertvillanova requested changes Sep 5, 2025

View reviewed changes

albertvillanova linked an issue Sep 12, 2025 that may be closed by this pull request

BUG: Stop sequences not working properly in TransformersModel #1703

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use stopping criteria from transformers (and other minor transformer fixes) #1723

Use stopping criteria from transformers (and other minor transformer fixes) #1723

Uh oh!

gante commented Aug 26, 2025 •

edited

Loading

Uh oh!

albertvillanova left a comment •

edited

Loading

Uh oh!

albertvillanova Aug 27, 2025

Uh oh!

gante Aug 27, 2025

Uh oh!

albertvillanova Aug 27, 2025

Uh oh!

albertvillanova Aug 27, 2025 •

edited

Loading

Uh oh!

albertvillanova Aug 27, 2025

Uh oh!

albertvillanova Aug 27, 2025

Uh oh!

gante Aug 27, 2025

Uh oh!

albertvillanova Aug 27, 2025

Uh oh!

albertvillanova left a comment

Uh oh!

gante commented Aug 27, 2025

Uh oh!

albertvillanova commented Sep 5, 2025 •

edited

Loading

Uh oh!

albertvillanova left a comment •

edited

Loading

Uh oh!

Uh oh!

Use stopping criteria from transformers (and other minor transformer fixes) #1723

Are you sure you want to change the base?

Use stopping criteria from transformers (and other minor transformer fixes) #1723

Uh oh!

Conversation

gante commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

albertvillanova left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albertvillanova Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

gante Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

albertvillanova Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

albertvillanova Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albertvillanova Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

albertvillanova Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

gante Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

albertvillanova Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

albertvillanova left a comment

Choose a reason for hiding this comment

Uh oh!

gante commented Aug 27, 2025

Uh oh!

albertvillanova commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

albertvillanova left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gante commented Aug 26, 2025 •

edited

Loading

albertvillanova left a comment •

edited

Loading

albertvillanova Aug 27, 2025 •

edited

Loading

albertvillanova commented Sep 5, 2025 •

edited

Loading

albertvillanova left a comment •

edited

Loading