Skip to content

Fix full-finetuning fp32 precision fallback for issue #4082#4114

Open
danielhanchen wants to merge 2 commits intomainfrom
dh/fix-4082-full-finetune-precision
Open

Fix full-finetuning fp32 precision fallback for issue #4082#4114
danielhanchen wants to merge 2 commits intomainfrom
dh/fix-4082-full-finetune-precision

Conversation

@danielhanchen
Copy link
Contributor

Summary

Fix full-finetuning precision handling when model params are float32 so SFTTrainer does not raise a false fp16/bf16 mismatch.

This addresses issue #4082 behavior where dtype=torch.float16 during full finetuning can upcast model params to float32, then trip the "model is bfloat16 but fp16 requested" guard.

Changes

  • unsloth/models/rl.py
    • Honor UNSLOTH_FORCE_FLOAT32=1 regardless of full_finetuning mode.
    • Split dtype checks into explicit buckets: is_float16, is_bfloat16, is_float32.
    • Keep true fp16<->bf16 mismatch errors.
    • Add float32 + fp16 fallback path: auto switch to float32 training instead of raising mismatch.
    • Fix auto mixed-precision defaults so float32 models do not get forced to bf16 when both fp16 and bf16 are false.
  • unsloth/models/loader.py
    • Preserve user-provided UNSLOTH_FORCE_FLOAT32=1 instead of unconditionally resetting to 0.

Validation

Using temp/issue_4082_replication/repro_4082_fp16.py with unsloth==2026.2.1 editable install from this branch:

Post-patch results:

  • A_float16_fp16_force0: pass
  • B_float16_fp16_force1: pass
  • C_bfloat16_fp16_force0: fail (expected true mismatch)
  • D_bfloat16_bf16_force0: pass
  • E_float16_noamp_force0: pass
  • F_float16_noamp_force1: pass

Logs are in:

  • logs/issue_4082_replication/postpatch/

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @danielhanchen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an issue where full finetuning with fp16 could lead to erroneous precision mismatch errors when model parameters were float32. The changes ensure that the system correctly identifies and handles float32 models, either by automatically switching to float32 training or by respecting the UNSLOTH_FORCE_FLOAT32 environment variable, thereby preventing false precision guards from triggering and improving the robustness of mixed-precision training.

Highlights

  • Precision Handling Logic Refinement: The logic for handling fp16, bf16, and float32 training has been refined in unsloth/models/rl.py to prevent false mismatch errors, especially when float32 model parameters are involved.
  • Consistent UNSLOTH_FORCE_FLOAT32 Enforcement: The UNSLOTH_FORCE_FLOAT32 environment variable is now honored universally in unsloth/models/rl.py, irrespective of the full_finetuning mode, ensuring consistent float32 training when explicitly requested.
  • Corrected Mixed-Precision Defaults: Automatic mixed-precision defaults in unsloth/models/rl.py were corrected to prevent float32 models from being incorrectly forced into bf16 training when neither fp16 nor bf16 is explicitly enabled.
  • Environment Variable Preservation: The UNSLOTH_FORCE_FLOAT32 environment variable is now preserved in unsloth/models/loader.py if set by the user prior to model loading, instead of being unconditionally reset.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • unsloth/models/loader.py
    • Modified the from_pretrained function to preserve the user-defined UNSLOTH_FORCE_FLOAT32 environment variable, preventing its unintended reset.
  • unsloth/models/rl.py
    • Refactored precision checks to explicitly differentiate between float16, bfloat16, and float32.
    • Implemented a fallback mechanism to automatically switch to float32 training if a float32 model is encountered with an fp16 request, avoiding type errors.
    • Ensured UNSLOTH_FORCE_FLOAT32=1 is respected for all training modes, including full finetuning.
    • Adjusted mixed-precision defaults to prevent float32 models from being coerced into bf16 when fp16 and bf16 are not explicitly set.
Activity
  • No human activity (comments, reviews) has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1fbfef9db8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1204 to +1205
user_forced_float32 = os.environ.get("UNSLOTH_FORCE_FLOAT32", "0") == "1"
os.environ["UNSLOTH_FORCE_FLOAT32"] = "1" if user_forced_float32 else "0"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reset forced-float32 flag between model loads

This now treats any existing UNSLOTH_FORCE_FLOAT32=1 as a user override, but that env var is also set internally later in this same function when a model matches FORCE_FLOAT32. As a result, after loading one model that auto-forces float32, subsequent model loads in the same process will inherit UNSLOTH_FORCE_FLOAT32=1 and keep forcing float32 even when the new model does not require it; downstream RL trainer logic reads this flag (unsloth/models/rl.py mixed-precision setup) and disables fp16/bf16 unexpectedly.

Useful? React with 👍 / 👎.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request provides a solid fix for handling float32 precision fallbacks during full finetuning. The changes in unsloth/models/rl.py are well-implemented, introducing clearer dtype checks and a correct fallback mechanism. The modification in unsloth/models/loader.py also correctly ensures that user-defined environment variable settings are respected. The code is clean and effectively resolves the reported issue.

" args.bf16 = not float16\n"
" os.environ['ACCELERATE_MIXED_PRECISION'] = 'fp16' if float16 else 'bf16'\n"
" if hasattr(args, 'mixed_precision'): args.mixed_precision = 'fp16' if float16 else 'bf16'\n"
" if is_float16:\n"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this fix the few dtype issues we've seen? Especially on T4?

@Datta0
Copy link
Collaborator

Datta0 commented Feb 25, 2026

Perhaps fixes : #3956

@danielhanchen
Copy link
Contributor Author

Follow-up hardening has been pushed to this PR:

  • commit: d27c104e
  • change: sync UNSLOTH_FORCE_FLOAT32=1 when trainer logic enters forced-float32 mode in unsloth/models/rl.py

Validation rerun after this commit:

  1. Targeted env-sync check
  • script: temp/issue_4082_followup/check_force_env.py
  • result: pass
  • confirmed after_trainer_init_env == "1" and training succeeds in the float32+fp16 mismatch path.
  1. Full dtype x mode matrix rerun
  • log: logs/issue_4082_matrix_postfix/runner_master.log
  • result: all 8 cases pass again (RUNNER_EXIT_CODE=0)
    • full finetuning: float16, bfloat16
    • LoRA 16bit: float16, bfloat16
    • QLoRA 4bit: float16, bfloat16
    • QLoRA 8bit: float16, bfloat16
  1. GRPO notebook regression check
  • script: temp/issue_4082_notebooks_round2/run_scripts/Advanced_Llama3_2_3B_GRPO_LoRA_run.py
  • log: logs/issue_4082_followup/grpo_notebook_rerun.log
  • result: pass (EXIT_CODE=0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants