Skip to content

Conversation

gante
Copy link
Member

@gante gante commented Aug 26, 2025

Related to #1703

This PR:

  1. [bugfix] Uses the transformers stopping criteria to stop on arbitrary strings, as opposed to using a custom class (more details below)
  2. [removes a warning] Updates torch_dtype -> dtype on transformers-related code. This was a recent deprecation.
  3. [removes a warning, possibly fixes bugs] Passes the attention mask from the tokenizer to generate. Generating the attention mask on the fly from input_ids in generate is very brittle and should be avoided.

The custom stopping criteria defined in smolagents compares the generated text against the defined strings. If the generated text ends in the same strings, it stops generation.

This has two problems:

  1. The comparison is done at a text level, so we have to decode the generated tokens at each step (= move tensors to CPU and then decode). In general, this is slow;
  2. [source of bugs] to work, the model's generation at a given step has to end exactly on that piece of text. For instance, if we want to stop in foo bar, but the model generates foo bar or foo bar:, generation won't stop, and we probably want it to stop.

Both issues are addressed in the class present in transformers, so let's use it instead 🤗

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the awesome fixes!! 🤗
Just some minor comments.

torch_dtype (`str`, *optional*):
The torch_dtype to initialize your model with.
dtype (`str`, *optional*):
The dtype to initialize your model with.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a breaking change you introduced in transformers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes -- we'll guarantee BC until v5.0.0 I believe


# BC: previously the type was set through `torch_dtype`. `dtype` is now prefered
torch_dtype = kwargs.pop("torch_dtype", None)
dtype = dtype or torch_dtype
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see the explanation here.

Copy link
Member

@albertvillanova albertvillanova Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should emit a deprecation warning for smolagents users? What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, this makes some CI tests fail:

TypeError: LlamaForCausalLM.__init__() got an unexpected keyword argument 'dtype'

completion_kwargs["max_new_tokens"] = max_new_tokens
return dict(
inputs=prompt_tensor,
**prompt_tensor,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure of understanding this change... 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inputs were being prepared such that prompt_tensor only contained the input_ids. However, depending on the models and usage, the corresponding attention_mask (also returned by the tokenizer) may also be needed for a correct output. While using smolagents with transformers models, we could see a related warning being thrown :)

These changes make it so we pass all tokenizer encoding outputs (input_ids AND attention_mask) to model.generate, and thus guarantee correctness.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the clear explanation! 🤗

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why we get these errors for some models? https://github.com/huggingface/smolagents/actions/runs/17267725950/job/49003833323?pr=1723

  • LlamaForCausalLM.__init__() got an unexpected keyword argument 'dtype'
  • LlavaForConditionalGeneration.__init__() got an unexpected keyword argument 'dtype'
FAILED tests/test_agents.py::TestAgent::test_transformers_toolcalling_agent - TypeError: LlamaForCausalLM.__init__() got an unexpected keyword argument 'dtype'
FAILED tests/test_models.py::TestModel::test_transformers_message_no_tool - TypeError: LlamaForCausalLM.__init__() got an unexpected keyword argument 'dtype'
FAILED tests/test_models.py::TestModel::test_transformers_message_vl_no_tool - ValueError: Failed to load tokenizer and model for model_id='llava-hf/llava-interleave-qwen-0.5b-hf': LlavaForConditionalGeneration.__init__() got an unexpected keyword argument 'dtype'

@gante
Copy link
Member Author

gante commented Aug 27, 2025

Not sure, I'll have to debug :D (can do it tomorrow)

@albertvillanova
Copy link
Member

albertvillanova commented Sep 5, 2025

I am rerunning the tests after the latest transformers patch release: https://github.com/huggingface/transformers/releases/tag/v4.56.1

This patch most notably fixes an issue with the new dtype argument (replacing torch_dtype) in pipelines!

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the transformers 4.56.1 patch release, the previous dtype errors disappeared.

  • I think the current CI error could be easily fixed: AssertionError: assert 'This is a photo' == 'This is a very'

However, we support transformers>=4.0.0, and therefore, those dtype errors could be an issue for some users.

I would suggest:

  • This PR handles only the stopping criteria
  • Leave the support for both dtype and torch_dtype for a subsequent PR

What do you think?

@albertvillanova albertvillanova linked an issue Sep 12, 2025 that may be closed by this pull request
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Stop sequences not working properly in TransformersModel
2 participants