Skip to content

fix: handle missing add_generation_prompt in saved LoRA tokenizers#4179

Open
Fizza-Mukhtar wants to merge 3 commits intounslothai:mainfrom
Fizza-Mukhtar:fix/lora-tokenizer-chat-template
Open

fix: handle missing add_generation_prompt in saved LoRA tokenizers#4179
Fizza-Mukhtar wants to merge 3 commits intounslothai:mainfrom
Fizza-Mukhtar:fix/lora-tokenizer-chat-template

Conversation

@Fizza-Mukhtar
Copy link
Contributor

Problem

When a LoRA adapter is trained with a ChatML template (e.g., Hermes-3,
Magnum-v2) via LlamaFactory and saved, the saved tokenizer's chat_template
loses the {% if add_generation_prompt %} block. Loading this LoRA for
inference then crashes immediately:

RuntimeError: The tokenizer does not have a
{% if add_generation_prompt %} for generation purposes.

Fixes #4150

Root Cause

_fix_chat_template only handled templates ending with {{ something }}
not empty endings, which is exactly what LlamaFactory saves.

Fix

  1. _fix_chat_template: Added handler for empty after_endfor — appends
    correct ChatML generation prompt block when missing.
  2. fix_chat_template: Changed final fallback from RuntimeError to
    logger.warning_once so users can still load their adapter if auto-fix fails.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical issue where LoRA adapters, when trained with certain ChatML templates and saved, would result in tokenizers that lacked the necessary add_generation_prompt block. This omission caused immediate crashes during inference. The solution involves refining the chat template parsing logic to correctly re-insert the missing prompt block and gracefully handling unfixable templates by issuing a warning instead of halting execution.

Highlights

  • Enhanced Chat Template Fixing: The _fix_chat_template function has been updated to correctly handle cases where a LoRA adapter's saved tokenizer loses the {% if add_generation_prompt %} block, specifically when the template has an empty ending after endfor.
  • Improved Error Handling: The fix_chat_template function now issues a logger.warning_once instead of raising a RuntimeError if the automatic template fix fails. This allows users to still load their adapter, albeit with a warning about potential generation issues.
Changelog
  • unsloth/tokenizer_utils.py
    • Added a new conditional block in _fix_chat_template to detect and append the add_generation_prompt block when it's missing and the template has an empty after_endfor section.
    • Changed the error handling in fix_chat_template from raising a RuntimeError to logging a warning if the chat template cannot be automatically fixed, allowing the application to continue loading.
Activity
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an issue where LoRA adapters trained with ChatML templates would crash during inference due to a missing add_generation_prompt block in the saved tokenizer's chat template. The fix involves two parts: first, adding logic to _fix_chat_template to correctly patch templates with empty endings, and second, changing a RuntimeError to a logger.warning_once in fix_chat_template to prevent crashes when auto-fixing fails. While the approach is sound, I've identified a minor but important issue in the implementation of the template patch that could lead to incorrect generation.

@Fizza-Mukhtar Fizza-Mukhtar force-pushed the fix/lora-tokenizer-chat-template branch from cfca1db to 5b6f8d7 Compare March 7, 2026 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Hermes Lora chat test bug: does not have a {% if add_generation_prompt %} for generation purposes

1 participant