fix: handle missing add_generation_prompt in saved LoRA tokenizers#4179
fix: handle missing add_generation_prompt in saved LoRA tokenizers#4179Fizza-Mukhtar wants to merge 3 commits intounslothai:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical issue where LoRA adapters, when trained with certain ChatML templates and saved, would result in tokenizers that lacked the necessary Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request addresses an issue where LoRA adapters trained with ChatML templates would crash during inference due to a missing add_generation_prompt block in the saved tokenizer's chat template. The fix involves two parts: first, adding logic to _fix_chat_template to correctly patch templates with empty endings, and second, changing a RuntimeError to a logger.warning_once in fix_chat_template to prevent crashes when auto-fixing fails. While the approach is sound, I've identified a minor but important issue in the implementation of the template patch that could lead to incorrect generation.
cfca1db to
5b6f8d7
Compare
for more information, see https://pre-commit.ci
Problem
When a LoRA adapter is trained with a ChatML template (e.g., Hermes-3,
Magnum-v2) via LlamaFactory and saved, the saved tokenizer's chat_template
loses the
{% if add_generation_prompt %}block. Loading this LoRA forinference then crashes immediately:
RuntimeError: The tokenizer does not have a
{% if add_generation_prompt %} for generation purposes.
Fixes #4150
Root Cause
_fix_chat_templateonly handled templates ending with{{ something }}—not empty endings, which is exactly what LlamaFactory saves.
Fix
_fix_chat_template: Added handler for emptyafter_endfor— appendscorrect ChatML generation prompt block when missing.
fix_chat_template: Changed final fallback fromRuntimeErrortologger.warning_onceso users can still load their adapter if auto-fix fails.