Fix full-finetuning fp32 precision fallback for issue #4082#4114
Fix full-finetuning fp32 precision fallback for issue #4082#4114danielhanchen wants to merge 2 commits intomainfrom
Conversation
Summary of ChangesHello @danielhanchen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves an issue where full finetuning with Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1fbfef9db8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| user_forced_float32 = os.environ.get("UNSLOTH_FORCE_FLOAT32", "0") == "1" | ||
| os.environ["UNSLOTH_FORCE_FLOAT32"] = "1" if user_forced_float32 else "0" |
There was a problem hiding this comment.
Reset forced-float32 flag between model loads
This now treats any existing UNSLOTH_FORCE_FLOAT32=1 as a user override, but that env var is also set internally later in this same function when a model matches FORCE_FLOAT32. As a result, after loading one model that auto-forces float32, subsequent model loads in the same process will inherit UNSLOTH_FORCE_FLOAT32=1 and keep forcing float32 even when the new model does not require it; downstream RL trainer logic reads this flag (unsloth/models/rl.py mixed-precision setup) and disables fp16/bf16 unexpectedly.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Code Review
This pull request provides a solid fix for handling float32 precision fallbacks during full finetuning. The changes in unsloth/models/rl.py are well-implemented, introducing clearer dtype checks and a correct fallback mechanism. The modification in unsloth/models/loader.py also correctly ensures that user-defined environment variable settings are respected. The code is clean and effectively resolves the reported issue.
| " args.bf16 = not float16\n" | ||
| " os.environ['ACCELERATE_MIXED_PRECISION'] = 'fp16' if float16 else 'bf16'\n" | ||
| " if hasattr(args, 'mixed_precision'): args.mixed_precision = 'fp16' if float16 else 'bf16'\n" | ||
| " if is_float16:\n" |
There was a problem hiding this comment.
Does this fix the few dtype issues we've seen? Especially on T4?
|
Perhaps fixes : #3956 |
|
Follow-up hardening has been pushed to this PR:
Validation rerun after this commit:
|
Summary
Fix full-finetuning precision handling when model params are float32 so SFTTrainer does not raise a false fp16/bf16 mismatch.
This addresses issue #4082 behavior where
dtype=torch.float16during full finetuning can upcast model params to float32, then trip the "model is bfloat16 but fp16 requested" guard.Changes
unsloth/models/rl.pyUNSLOTH_FORCE_FLOAT32=1regardless offull_finetuningmode.is_float16,is_bfloat16,is_float32.fp16andbf16are false.unsloth/models/loader.pyUNSLOTH_FORCE_FLOAT32=1instead of unconditionally resetting to0.Validation
Using
temp/issue_4082_replication/repro_4082_fp16.pywithunsloth==2026.2.1editable install from this branch:Post-patch results:
A_float16_fp16_force0: passB_float16_fp16_force1: passC_bfloat16_fp16_force0: fail (expected true mismatch)D_bfloat16_bf16_force0: passE_float16_noamp_force0: passF_float16_noamp_force1: passLogs are in:
logs/issue_4082_replication/postpatch/