Fix full-finetuning fp32 precision fallback for issue #4082 by danielhanchen · Pull Request #4114 · unslothai/unsloth

danielhanchen · 2026-02-25T15:50:23Z

Summary

Fix full-finetuning precision handling when model params are float32 so SFTTrainer does not raise a false fp16/bf16 mismatch.

This addresses issue #4082 behavior where dtype=torch.float16 during full finetuning can upcast model params to float32, then trip the "model is bfloat16 but fp16 requested" guard.

Changes

unsloth/models/rl.py
- Honor UNSLOTH_FORCE_FLOAT32=1 regardless of full_finetuning mode.
- Split dtype checks into explicit buckets: is_float16, is_bfloat16, is_float32.
- Keep true fp16<->bf16 mismatch errors.
- Add float32 + fp16 fallback path: auto switch to float32 training instead of raising mismatch.
- Fix auto mixed-precision defaults so float32 models do not get forced to bf16 when both fp16 and bf16 are false.
unsloth/models/loader.py
- Preserve user-provided UNSLOTH_FORCE_FLOAT32=1 instead of unconditionally resetting to 0.

Validation

Using temp/issue_4082_replication/repro_4082_fp16.py with unsloth==2026.2.1 editable install from this branch:

Post-patch results:

A_float16_fp16_force0: pass
B_float16_fp16_force1: pass
C_bfloat16_fp16_force0: fail (expected true mismatch)
D_bfloat16_bf16_force0: pass
E_float16_noamp_force0: pass
F_float16_noamp_force1: pass

Logs are in:

logs/issue_4082_replication/postpatch/

gemini-code-assist · 2026-02-25T15:50:43Z

Summary of Changes

Hello @danielhanchen, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an issue where full finetuning with fp16 could lead to erroneous precision mismatch errors when model parameters were float32. The changes ensure that the system correctly identifies and handles float32 models, either by automatically switching to float32 training or by respecting the UNSLOTH_FORCE_FLOAT32 environment variable, thereby preventing false precision guards from triggering and improving the robustness of mixed-precision training.

Highlights

Precision Handling Logic Refinement: The logic for handling fp16, bf16, and float32 training has been refined in unsloth/models/rl.py to prevent false mismatch errors, especially when float32 model parameters are involved.
Consistent UNSLOTH_FORCE_FLOAT32 Enforcement: The UNSLOTH_FORCE_FLOAT32 environment variable is now honored universally in unsloth/models/rl.py, irrespective of the full_finetuning mode, ensuring consistent float32 training when explicitly requested.
Corrected Mixed-Precision Defaults: Automatic mixed-precision defaults in unsloth/models/rl.py were corrected to prevent float32 models from being incorrectly forced into bf16 training when neither fp16 nor bf16 is explicitly enabled.
Environment Variable Preservation: The UNSLOTH_FORCE_FLOAT32 environment variable is now preserved in unsloth/models/loader.py if set by the user prior to model loading, instead of being unconditionally reset.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

unsloth/models/loader.py
- Modified the from_pretrained function to preserve the user-defined UNSLOTH_FORCE_FLOAT32 environment variable, preventing its unintended reset.
unsloth/models/rl.py
- Refactored precision checks to explicitly differentiate between float16, bfloat16, and float32.
- Implemented a fallback mechanism to automatically switch to float32 training if a float32 model is encountered with an fp16 request, avoiding type errors.
- Ensured UNSLOTH_FORCE_FLOAT32=1 is respected for all training modes, including full finetuning.
- Adjusted mixed-precision defaults to prevent float32 models from being coerced into bf16 when fp16 and bf16 are not explicitly set.

Activity

No human activity (comments, reviews) has been recorded on this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1fbfef9db8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-25T15:52:09Z

unsloth/models/loader.py

+        user_forced_float32 = os.environ.get("UNSLOTH_FORCE_FLOAT32", "0") == "1"
+        os.environ["UNSLOTH_FORCE_FLOAT32"] = "1" if user_forced_float32 else "0"


Reset forced-float32 flag between model loads

This now treats any existing UNSLOTH_FORCE_FLOAT32=1 as a user override, but that env var is also set internally later in this same function when a model matches FORCE_FLOAT32. As a result, after loading one model that auto-forces float32, subsequent model loads in the same process will inherit UNSLOTH_FORCE_FLOAT32=1 and keep forcing float32 even when the new model does not require it; downstream RL trainer logic reads this flag (unsloth/models/rl.py mixed-precision setup) and disables fp16/bf16 unexpectedly.

Useful? React with 👍 / 👎.

gemini-code-assist

Code Review

This pull request provides a solid fix for handling float32 precision fallbacks during full finetuning. The changes in unsloth/models/rl.py are well-implemented, introducing clearer dtype checks and a correct fallback mechanism. The modification in unsloth/models/loader.py also correctly ensures that user-defined environment variable settings are respected. The code is clean and effectively resolves the reported issue.

Datta0 · 2026-02-25T16:13:05Z

unsloth/models/rl.py

-            "    args.bf16 = not float16\n"
-            "    os.environ['ACCELERATE_MIXED_PRECISION'] = 'fp16' if float16 else 'bf16'\n"
-            "    if hasattr(args, 'mixed_precision'): args.mixed_precision = 'fp16' if float16 else 'bf16'\n"
+            "    if is_float16:\n"


Does this fix the few dtype issues we've seen? Especially on T4?

Datta0 · 2026-02-25T16:13:45Z

Perhaps fixes : #3956

danielhanchen · 2026-02-25T16:40:59Z

Follow-up hardening has been pushed to this PR:

commit: d27c104e
change: sync UNSLOTH_FORCE_FLOAT32=1 when trainer logic enters forced-float32 mode in unsloth/models/rl.py

Validation rerun after this commit:

Targeted env-sync check

script: temp/issue_4082_followup/check_force_env.py
result: pass
confirmed after_trainer_init_env == "1" and training succeeds in the float32+fp16 mismatch path.

Full dtype x mode matrix rerun

log: logs/issue_4082_matrix_postfix/runner_master.log
result: all 8 cases pass again (RUNNER_EXIT_CODE=0)
- full finetuning: float16, bfloat16
- LoRA 16bit: float16, bfloat16
- QLoRA 4bit: float16, bfloat16
- QLoRA 8bit: float16, bfloat16

GRPO notebook regression check

script: temp/issue_4082_notebooks_round2/run_scripts/Advanced_Llama3_2_3B_GRPO_LoRA_run.py
log: logs/issue_4082_followup/grpo_notebook_rerun.log
result: pass (EXIT_CODE=0)

Fix full-finetuning precision fallback for fp32 models

1fbfef9

danielhanchen requested review from Datta0, mmathew23 and pluesclues as code owners February 25, 2026 15:50

chatgpt-codex-connector bot reviewed Feb 25, 2026

View reviewed changes

gemini-code-assist bot reviewed Feb 25, 2026

View reviewed changes

Datta0 reviewed Feb 25, 2026

View reviewed changes

Sync UNSLOTH_FORCE_FLOAT32 in forced float32 trainer path

d27c104

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix full-finetuning fp32 precision fallback for issue #4082#4114

Fix full-finetuning fp32 precision fallback for issue #4082#4114
danielhanchen wants to merge 2 commits intomainfrom
dh/fix-4082-full-finetune-precision

danielhanchen commented Feb 25, 2026

Uh oh!

gemini-code-assist bot commented Feb 25, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 25, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Datta0 Feb 25, 2026

Uh oh!

Datta0 commented Feb 25, 2026

Uh oh!

danielhanchen commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		user_forced_float32 = os.environ.get("UNSLOTH_FORCE_FLOAT32", "0") == "1"
		os.environ["UNSLOTH_FORCE_FLOAT32"] = "1" if user_forced_float32 else "0"

Uh oh!

Conversation

danielhanchen commented Feb 25, 2026

Summary

Changes

Validation

Uh oh!

gemini-code-assist bot commented Feb 25, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Datta0 Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Datta0 commented Feb 25, 2026

Uh oh!

danielhanchen commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants