fix: handle missing add_generation_prompt in saved LoRA tokenizers by Fizza-Mukhtar · Pull Request #4179 · unslothai/unsloth

Fizza-Mukhtar · 2026-03-07T15:37:22Z

Problem

When a LoRA adapter is trained with a ChatML template (e.g., Hermes-3,
Magnum-v2) via LlamaFactory and saved, the saved tokenizer's chat_template
loses the {% if add_generation_prompt %} block. Loading this LoRA for
inference then crashes immediately:

RuntimeError: The tokenizer does not have a
{% if add_generation_prompt %} for generation purposes.

Fixes #4150

Root Cause

_fix_chat_template only handled templates ending with {{ something }} —
not empty endings, which is exactly what LlamaFactory saves.

Fix

_fix_chat_template: Added handler for empty after_endfor — appends
correct ChatML generation prompt block when missing.
fix_chat_template: Changed final fallback from RuntimeError to
logger.warning_once so users can still load their adapter if auto-fix fails.

gemini-code-assist · 2026-03-07T15:37:33Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical issue where LoRA adapters, when trained with certain ChatML templates and saved, would result in tokenizers that lacked the necessary add_generation_prompt block. This omission caused immediate crashes during inference. The solution involves refining the chat template parsing logic to correctly re-insert the missing prompt block and gracefully handling unfixable templates by issuing a warning instead of halting execution.

Highlights

Enhanced Chat Template Fixing: The _fix_chat_template function has been updated to correctly handle cases where a LoRA adapter's saved tokenizer loses the {% if add_generation_prompt %} block, specifically when the template has an empty ending after endfor.
Improved Error Handling: The fix_chat_template function now issues a logger.warning_once instead of raising a RuntimeError if the automatic template fix fails. This allows users to still load their adapter, albeit with a warning about potential generation issues.

Changelog

unsloth/tokenizer_utils.py
- Added a new conditional block in _fix_chat_template to detect and append the add_generation_prompt block when it's missing and the template has an empty after_endfor section.
- Changed the error handling in fix_chat_template from raising a RuntimeError to logging a warning if the chat template cannot be automatically fixed, allowing the application to continue loading.

Activity

The pull request was created by Fizza-Mukhtar to address issue [Bug] Hermes Lora chat test bug: does not have a {% if add_generation_prompt %} for generation purposes #4150, which describes the problem of missing add_generation_prompt in saved LoRA tokenizers.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request addresses an issue where LoRA adapters trained with ChatML templates would crash during inference due to a missing add_generation_prompt block in the saved tokenizer's chat template. The fix involves two parts: first, adding logic to _fix_chat_template to correctly patch templates with empty endings, and second, changing a RuntimeError to a logger.warning_once in fix_chat_template to prevent crashes when auto-fixing fails. While the approach is sound, I've identified a minor but important issue in the implementation of the template patch that could lead to incorrect generation.

unsloth/tokenizer_utils.py

for more information, see https://pre-commit.ci

fix: handle missing add_generation_prompt in saved LoRA tokenizers

353c05f

Fizza-Mukhtar requested review from danielhanchen and mmathew23 as code owners March 7, 2026 15:37

gemini-code-assist bot reviewed Mar 7, 2026

View reviewed changes

unsloth/tokenizer_utils.py Show resolved Hide resolved

fix: use actual newline in ChatML generation prompt template

5b6f8d7

Fizza-Mukhtar force-pushed the fix/lora-tokenizer-chat-template branch from cfca1db to 5b6f8d7 Compare March 7, 2026 15:44

[pre-commit.ci] auto fixes from pre-commit.com hooks

00a2f0b

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: handle missing add_generation_prompt in saved LoRA tokenizers#4179

fix: handle missing add_generation_prompt in saved LoRA tokenizers#4179
Fizza-Mukhtar wants to merge 3 commits intounslothai:mainfrom
Fizza-Mukhtar:fix/lora-tokenizer-chat-template

Fizza-Mukhtar commented Mar 7, 2026

Uh oh!

gemini-code-assist bot commented Mar 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Fizza-Mukhtar commented Mar 7, 2026

Problem

Root Cause

Fix

Uh oh!

gemini-code-assist bot commented Mar 7, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant