Releases: unslothai/unsloth
New Important Updates!
Hey guys, it's only been 2 days since our last release, but we’ve got a lot more important updates:
- Inference is now 20–30% faster. Previously, tool-calling and repeat penalty could slow inference below normal speeds. Inference tokens/s should now perform similar to
llama-server/llama.cpp. - Now Auto-detects older or pre-existing models downloaded from LM Studio, Hugging Face, and similar sources.
- Inference token/s speed is now calculated correctly. Previously, tokens/s included startup time, which made the displayed speed look slower than it actually was. It should now reflect 'true' inference speed.
- CPU usage no longer spikes. Previously, inline querier identity changed every render, causing
useLiveQueryto resubscribe continuously. - Unsloth Studio now has a shutdown x button and shuts down properly. Previously, closing it after opening from the desktop icon would not close it properly. Now, launching from the shortcut also opens the terminal, and closing that terminal fully exits Unsloth Studio. If you still have it open from a previous session you can restart your computer or run
lsof -i :8888thenkill -9 <PID>. - Even better tool-calling and websearch with reduced errors.
- Updated documentation with lots of new info on deleting models, uninstalling etc.
- Cleaner, smarter install and setup logging across Windows and Linux. Output is now easier to read with consistent formatting, quieter by default for a smoother experience, and supports richer
--verbosediagnostics when you want full technical detail.
{% endupdate %} - You can now view your training history
What's Changed
- Bump installer min version to 2026.3.12 by @danielhanchen in #4600
- Fix Colab Studio launch and setup.ps1 box alignment by @danielhanchen in #4601
- Fix Colab huggingface-hub conflict, ensurepip fallback, bump to 2026.3.14 by @danielhanchen in #4603
- Update README.md by @rolandtannous in #4604
- fix: skip flex_attention for models with non-zero attention_dropout by @Abhinavexists in #4605
- Fix Colab setup skipping llama.cpp installation by @rolandtannous in #4618
- fix: show recommended models in search results by @Shine1i in #4615
- studio: align Dataset/Parameters/Training cards, fix expandable height, animate LoRA settings by @Imagineer99 in #4614
- fix: Windows installer fails on _yaml.pyd Access Denied (os error 5) by @Etherll in #4617
- studio: humanize ETA display for long training runs by @RadouaneElhajali in #4608
- fix: add python-json-logger to data-designer-deps by @Shine1i in #4627
- [Studio] Colab fix - Allow install_python_stack to run on Colab by @rolandtannous in #4633
- Fix repetition_penalty default causing 24% TPS drop in GGUF inference by @danielhanchen in #4634
- fix: install.sh Mac Intel compatibility + Studio no-torch support by @danielhanchen in #4624
- tests: add no-torch / Intel Mac test suite by @danielhanchen in #4646
- fix: use unsloth[huggingfacenotorch] instead of --no-deps in no-torch mode by @danielhanchen in #4647
- Fix Gemma3N audio training stride assertion with non-reentrant checkpointing by @danielhanchen in #4629
- Fix missing num_items_in_batch in unsloth_prediction_step by @danielhanchen in #4616
- Make Studio shortcuts launch in a visible terminal by @danielhanchen in #4638
- studio: setup log styling by @Imagineer99 in #4494
- Fix ~1.2s TTFT penalty when tools are enabled in Studio by @danielhanchen in #4639
- Fix GGUF GPU fit check to account for KV cache VRAM by @danielhanchen in #4623
- feat: update app icons to rounded logo by @Shine1i in #4640
- Streaming tool detection: guard late tool_calls, filter incomplete fragments by @danielhanchen in #4648
- fix: install no-torch runtime deps via requirements file by @danielhanchen in #4649
- Fix orphan server cleanup killing user's own llama-server by @danielhanchen in #4622
- fix: add auth + UX improvements to shutdown button by @Shine1i in #4642
- Fix inference failing for transformers 5.x models (trust_remote_code) by @danielhanchen in #4652
- fix: no-torch install deps without pulling torch transitively by @danielhanchen in #4650
- Detect always-on reasoning models and show Think button as locked-on by @danielhanchen in #4654
- fix: replace navbar shutdown text button with icon-only button by @Shine1i in #4655
- Fall back to parsing model name when HF API has no param count by @danielhanchen in #4656
- fix: disable OCR in pymupdf4llm PDF extraction by @Shine1i in #4659
- Fix HF cache default and show LM Studio models in chat/inference by @rolandtannous in #4653
- Bump minimum unsloth version to 2026.3.16 in install scripts by @danielhanchen in #4663
New Contributors
- @Abhinavexists made their first contribution in #4605
- @RadouaneElhajali made their first contribution in #4608
Full Changelog: v0.1.2-beta...v0.1.25-beta
First Release post Unsloth Studio!
Hey guys, this is our first release since we launched Unsloth Studio last week. From now on you can directly access all our updates through our changelog here: https://unsloth.ai/docs/new/changelog
You can now update Unsloth Studio! Just use: unsloth studio update. Please update to use all the newest fixes and features.
- Tool calling improved. Better llama.cpp parsing, no raw tool markup in chat, faster inference, a new Tool Outputs panel, timers.
- Windows CPU or GPU now works seamlessly. Please reinstall!
- App shortcuts. Once installed, you can now launch in Windows, MacOS and Linux via a shortcut icon in the Start / Launch and Desktop.
- Pre-compiled
llama.cppbinaries andmamba_ssmfor finetuning - 6x faster installs! Also <300MB in size for binaries. - 50% reduced installation sizes (-7GB or more savings), 2x faster installs and faster resolving. 50% smaller pypi sizes.
- Colab with free T4 GPUs with Unsloth Studio now fixed! Try it here. Due to pre-compiled binaries, it's also 20x faster!
- You can now properly use old GGUFs from Hugging Face or LM Studio
- MacOS and CPU now have Data Recipes enabled with multi-file uploading.
- AMD support preliminary for Linux only machines - auto detects.
- Settings sidebar redesign. Settings are now grouped into Model, Sampling, Tools, and Preferences
- Context length now adjustable. Keep in mind this is not needed as llama.cpp smartly uses the exact context you need via
--fit on - Persistent system prompts and presets. Custom system prompts and chat presets now persist across reloads and page changes.
- Multi-file upload. Data recipes now support multiple drag-and-drop uploads for PDF, DOCX, TXT, and MD, with backend extraction, saved uploads, and improved previews.
- Better chat observability. Studio now shows
llama-servertimings and usage, a context-window usage bar, and richer source hover cards. - Better UX overall - clickable links, better LaTeX parsing, tool / code / web tooltips for default cards and much more!
- LiteLLM - Unsloth Studio and Unsloth were NOT affected by the recent LiteLLM compromise. Nemo Data Designer used LiteLLM only up to
1.80, not the affected1.82.7or1.82.8, and has since removed it entirely. - We now have a new one line install command, just run: Copy
curl -fsSLhttps://unsloth.ai/install.sh| sh
Fixes:
- Windows/setup improvements. Fixed silent Windows exits, Anaconda/conda-forge startup crashes, broken non-NVIDIA Windows installs, and missing early CUDA/stale-venv setup checks.
- System prompts fixed. They work again for non-GGUF text and vision inference.
- GGUF export expanded. Full fine-tunes, not just LoRA/PEFT, can now export to GGUF. Base model resolution is more reliable, and unsupported export options are disabled in the UI.
- Chat scroll/layout fixes. Fixed scroll-position issues during generation, thinking-panel layout shift, and viewport jumps when collapsing reasoning panels.
- Smarter port conflict detection. Studio now detects loopback conflicts, can identify the blocking process when possible, and gives clearer fallback-port messages.
Example of automatic parameter settings for context length etc:
super.final.mp4
What's Changed
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #4542
- fix: store embedding_learning_rate on self in UnslothTrainingArguments by @GoldenGrapeGentleman in #4531
- studio: persist system prompt and preset settings across navigation by @Imagineer99 in #4538
- studio: stop scroll hijack during generation and fix thinking panel layout shift by @Imagineer99 in #4543
- Fix Studio port conflict detection for loopback addresses by @danielhanchen in #4532
- fix(studio): show Windows-specific reset-password command by @Shine1i in #4529
- fix(studio): restore scroll lock on reasoning panel collapse by @danielhanchen in #4545
- fix: always show chat tool icons by @Shine1i in #4525
- fix: system prompt ignored in unsloth inference by @Shine1i in #4528
- fix: handle prompt/completion datasets in slow-path BOS detection by @danielhanchen in #4548
- fix: give @0xKushwaha git history credit for completion_only_loss fix by @danielhanchen in #4552
⚠️ Remove quarantinedlitellmfor precaution -- Unsloth Studio NOT affected by @danielhanchen in #4553- fix: pin unsloth>=2026.3.11 in install scripts by @danielhanchen in #4556
- Regroup chat settings sidebar into focused sections by @Shine1i in #4551
- Add GRPO resume vLLM cleanup guard by @MagellaX in #4411
- fix: prevent UnicodeEncodeError on Windows CP1252 consoles in studio setup by @Krishnachaitanyakc in #4563
- studio: windows desktop shortcut launcher by @Imagineer99 in #4558
- Remove duplicate frontend assets from wheel (~31 MB savings) by @danielhanchen in #4567
- feat(studio): training history persistence and past runs viewer by @Shine1i in #4501
- fix: remove auto wandb.finish() after train() to allow post-training evaluate() by @Krishnachaitanyakc in #4564
- feat: Implement Q-GaLore optimizer and custom embedding learning rate… by @OnePunchMonk in #4511
- Bump Data Designer to 0.5.4 (removes litellm dependency) by @danielhanchen in #4569
- feat(chat): cleaner tool UI, inline LaTeX, clickable links by @Shine1i in #4561
- [Studio] Try installing causal-conv1d from prebuilt wheels if avialable by @Datta0 in #4547
- Feature/add dependabot and codeql security checks by @pkloehn1 in #4479
- build(deps): bump the actions group with 2 updates by @dependabot[bot] in #4570
- build(deps): bump oxc-parser from 0.116.0 to 0.121.0 in /studio/backend/core/data_recipe/oxc-validator in the npm-oxc-validator group by @dependabot[bot] in #4571
- Remove advanced CodeQL workflow (conflicts with default setup) by @danielhanchen in #4584
- Add macOS and Linux desktop shortcuts to install.sh by @danielhanchen in #4568
- perf(studio): upgrade to Vite 8 + auto-install bun for faster frontend builds by @Etherll in #4522
- feat(tokenizer): add get_tokenizer_info() diagnostic helper by @cz-03 in #4436
- Add ROCm (AMD GPU) support to studio setup by @danielhanchen in #4585
- Consolidate dual venvs and separate install from update by @rolandtannous in #4530
- studio: stabilize reasoning panel scroll behavior and prevent composer overlap by @Imagineer99 in #4587
- Use prebuilt llama.cpp for unsloth studio setup by @mmathew23 in #4562
- fix(studio): add -ngl flag for GPU offloading in llama-server by @danielhanchen in #4588
- fix(studio): add pip nvidia CUDA libs to LD_LIBRARY_PATH for llama-server by @danielhanchen in #4590
- fix(studio): validate bun install and retry from official source on failure by @danielhanchen in #4589
- fix(studio): clear bun cache on failure and retry before falling back to npm by @danielhanchen in #4594
- Pin torch>=2.4,<2.11.0 in Studio installers by @danielhanchen in #4595
- fix(studio): source-build fallback prefers Unsloth's tested tag over upstream latest by @danielhanchen in #4593
- fix(studio): add bun cache validation to Windows setup.ps1 by @danielhanchen in #4596
- feat: multi-source model discovery (HF default, legacy cache, LM Studio) by @rolandtannous in #4591
- Add unsloth to User PATH on Windows after install by @danielhanchen in #4597
- Add PID file tracking and
unsloth studio stopcommand by @danielhanchen in #4598 - feat(studio): editable context length with Apply/Reset for GGUF settings by @danielhanchen in #4592
New Contributors
- @MagellaX made their first contribution in #4411
- @Krishnachaitanyakc made their first contribution in #4563
- @OnePunchMonk made their first contribution in #4511
- @pkloehn1 made their fir...
llama.cpp prebuilt b8475
Install-ready Unsloth Studio llama.cpp bundles for b8475.
Introducing Unsloth Studio (Beta)!
Hey guys, we're super excited to launch Unsloth Studio (Beta), a new open-source web UI to train and run LLMs.
Blog + everything you need to know: https://unsloth.ai/docs/new/studio
- Run models locally on Mac, Windows, Linux
- Compare and battle models side-by-side
- Train 500+ models 2x faster with 70% less VRAM
- Supports GGUF, vision, audio, embedding models
- Self-healing Tool calling / web search + code execution
- Auto-create datasets from PDF, CSV, DOCX
- Export models to GGUF, safetensor and more formats
MacOS, Linux, WSL:
For MacOS, ensure you have cmake installed. If not, run brew install cmake.
curl -fsSL https://unsloth.ai/install.sh | shThen to launch every time:
source unsloth_studio/bin/activate
unsloth studio -H 0.0.0.0 -p 8888Windows:
Run in Windows Powershell:
irm https://unsloth.ai/install.ps1 | iexThen to launch every time:
.\unsloth_studio\Scripts\activate
unsloth studio -H 0.0.0.0 -p 8888Docker
Use our Docker image unsloth/unsloth container. Run:
docker run -d -e JUPYTER_PASSWORD="mypassword" \
-p 8888:8888 -p 8000:8000 -p 2222:22 \
-v $(pwd)/work:/workspace/work \
--gpus all \
unsloth/unslothunsloth.studio.video.mp4
What's Changed
- Update CODEOWNERS for studio and cli by @danielhanchen in #4266
- [Feature] Support Sequence Classification by @danielhanchen in #4264
- [Feature] VLMs support for GRPO by @danielhanchen in #4265
- [Fix] Respect llm_int8_skip_modules for VLM by @danielhanchen in #4249
- ROCM support by @danielhanchen in #4271
- Remove Blackwell flex attention disable workaround from studio by @danielhanchen in #4273
- ROCM support by @danielhanchen in #4272
- fix: prevent ai-assist model config RCE via untrusted Hugging Face repos by @danielhanchen in #4274
- fix(seed): disable remote code execution in seed inspect dataset loads by @danielhanchen in #4275
- Update CODEOWNERS by @danielhanchen in #4279
- fix: install data-designer plugin non-editable for Colab compatibility by @LeoBorcherding in #4268
- Arch/mixtral by @danielhanchen in #4283
- Improve documentation on how to export model from Colab by @danielhanchen in #4284
- feat: Add Mixtral model support by @danielhanchen in #4285
- Initial changes: Refactor Attention by @danielhanchen in #4286
- patch vlm trainer to resize images by @danielhanchen in #4287
- [WIP] add support for mixtral by @danielhanchen in #4288
- studio: speed up setup -- uv for installs (8x), Ninja for llama.cpp (1.7x) by @danielhanchen in #4289
- fix: remove old comments by @Shine1i in #4292
- PR: Windows Setup Improvements by @rolandtannous in #4299
- miscallenous studio by @Shine1i in #4293
- Fix: Compare Mode Deadlock, Cancel Event Poisoning & IPC Optimization by @rolandtannous in #4303
- studio: fix GGUF inference -- reasoning tokens, max_tokens, server flags, GPU allocation by @danielhanchen in #4290
- chat only with gguf for mac devices by @Manan17 in #4300
- studio: add max steps and epochs toggle switch by @Imagineer99 in #4296
- Fix/colab plugin editable install by @LeoBorcherding in #4281
- Graceful shutdown on Windows (signal handlers for Ctrl+C) by @rolandtannous in #4306
- studio: simplify auth UX to password-only login by @Imagineer99 in #4305
- studio: preserve save_steps when toggling to epochs mode by @Imagineer99 in #4308
- Fix studio frontend build producing empty Tailwind CSS by @danielhanchen in #4311
- Fix setup.sh crash on Mac with empty gitignore array by @danielhanchen in #4313
- [Feature] studio: user can upload eval dataset by @Manan17 in #4307
- fix: Ctrl+C not terminating backend on Linux by @rolandtannous in #4316
- Add download progress bar for non-GGUF models in Chat by @danielhanchen in #4314
- Apply use_reentrant removal to all TRL trainer configs by @danielhanchen in #4321
- Fix VLM GRPO matmul shape mismatch in _get_per_token_logps_and_entropies by @danielhanchen in #4301
- Improve AI Assist: Update default model, model output parsing, logging, and dataset mapping UX by @rolandtannous in #4323
- studio: per-model inference defaults, GGUF slider fix, reasoning toggle by @danielhanchen in #4325
- fix: Resolve CUDA toolkit mismatch on multi-CUDA Windows systems by @rolandtannous in #4324
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #4332
- Fix/colab comment edits by @LeoBorcherding in #4317
- fix: add Qwen3.5 version gate in loader dispatch by @danielhanchen in #4335
- Fix xformers Blackwell guard: broader coverage and root cause docs by @danielhanchen in #4338
- studio: improve Colab notebook, redesign ready popup, and clean up install output by @LeoBorcherding in #4339
- Add check to disable xformers on newer GPUs by @pluesclues in #4342
- studio: training progress, CUDA lib path, dataset_num_proc fix by @danielhanchen in #4336
- studio: fix stale GGUF metadata, update helper model, auth improvements by @danielhanchen in #4346
- studio: show "Off" for repetition penalty = 1 by @danielhanchen in #4349
- studio: update Creative/Precise presets, show "Off" for disabled samplers by @danielhanchen in #4350
- studio: fix slow cancellation of GGUF generation by @danielhanchen in #4352
- Fix: Remove unused
warmupToastShownvariable (TS6133) by @rolandtannous in #4353 - Studio: SVG preview, fix streaming and model selector bugs by @danielhanchen in #4354
- fix: comment out debug print statements by @rolandtannous in #4357
- fix(llm_assist): disable thinking mode for helper model JSON output by @rolandtannous in #4358
- studio: improve onboarding UX, tooltips, and training defaults by @danielhanchen in #4355
New Contributors
- @LeoBorcherding made their first contribution in #4268
- @Shine1i made their first contribution in #4292
- @Manan17 made their first contribution in #4300
- @Imagineer99 made their first contribution in #4296
Full Changelog: https://github.com/unslothai/unsloth/commits/March-2026
llama.cpp prebuilt b8457
Install-ready Unsloth Studio llama.cpp bundles for b8457.
12x Faster MoE Training + Embedding support!
Our first release of 2026! This year we’ve got a lot of exciting things coming and to kick things off, we’re introducing faster MoE training, embedding model support, and ultra long context for Reinforcement Learning. We’ll also be launching our brand new UI very soon.
We’d like to thank all of you for 50K stars on GitHub! ⭐
We’ve also added support for many new models that you can now run and fine-tune locally, including DeepSeek-OCR 2, GLM-4.7-Flash, Kimi-2.5, and more.
🚀 Faster MoE training
You can now train MoE models 12× faster with 35% less VRAM and 6x longer context via our new Triton and math kernels (no accuracy loss). gpt-oss-20b works on 12.8GB VRAM. Qwen3-30B-A3B (16-bit LoRA) uses 63GB.
Unsloth supports fast training for gpt-oss, Qwen3 (30B, 235B, VL, Coder), DeepSeek R1/V3 arch and GLM (4.7, Flash) models.
🔎 Embedding models now train 2× faster
We collaborated with Hugging Face to enable 1.8-3.3x faster embedding, BERT and classifier model training with 20% less VRAM, 2x longer context & no accuracy loss vs. FA2 setups.
💡 Ultra Long Context RL is here
We’re introducing new batching algorithms to enable ~7x longer context (can be more than 12x) RL training with no accuracy or speed degradation vs. other optimized setups that use FA3, kernels & chunked losses.
Unsloth now trains gpt-oss QLoRA with 380K context on a single 192GB NVIDIA B200 GPU
🔮 New models
- 🐳 DeepSeek-OCR 2 - Run and fine-tune the new OCR model.
- 🥝 Kimi 2.5 - Run the SOTA model locally with Unsloth GGUFs.
- ⚡ GLM-4.7-Flash - Run and fine-tune the best-in-class 30B LLM.
🎉 Extra Updates
- As part of our MoE release, we also made Gemma-3 now use Flex-Attention by default, and this works in float16 settings as well (there were infinities which we solved a while back). Gemma-3 now uses O(N) memory and not O(N^2) memory, and trains >3x faster (scales even better with context length). Previous Unsloth versions would OOM.
- Vision fine-tuning now accepts mixed data of only images and text data!
trl==0.27.1andtransformers==5.1.0are supported well - previous coverage was 30% of all our 120 notebooks, but now we have >80% coverage - we plan to make it 100% over the next few days.- And many many other bug fixes and other updates!
📖 New Guides
- </> How To Use Claude Code + Codex with local LLMs: Guide
- 👾 Train & deploy to LM Studio for local inference: Guide
- 🎨 Run Diffusion image models with Unsloth GGUFs: Guide
Tip
Update Unsloth via pip install --upgrade --force-reinstall --no-cache-dir --no-deps unsloth unsloth_zoo
If you want PyTorch 2.9: pip install --upgrade unsloth unsloth_zoo
February is shaping up to be an amazing month for LLM releases, and we hope you’re just as excited as we are. 😊
What's Changed
- [FIX] [Transformers] VLM input embeds fix for gradients by @Datta0 in #3715
- [fbgemm] Silence tma fbgemm by @Datta0 in #3735
- [hf_hub] Token login by @Datta0 in #3739
- Do not overwrite slots by @Datta0 in #3752
- Fix VLM + DDP checkpointing by @djsaunde in #3751
- Enable 4-bit quantization on AMD Radeon GPUs by @sstamenk in #3748
- Nightly by @danielhanchen in #3753
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #3760
- Nightly by @danielhanchen in #3767
- Add missing import of inspect by @sstamenk in #3778
- Clarify NotImplementedError for fast_inference with full_finetuning by @Fizza-Mukhtar in #3768
- Update FUNDING.yml by @danielhanchen in #3792
- fix(trainer): import psutil to prevent NameError in _prepare_dataset by @alkinun in #3780
- fastrope fix for zero strided tensors by @f14-bertolotti in #3782
- Fix crash when trl.experimental.openenv is unavailable by @Fizza-Mukhtar in #3787
- Fix Boolean value of Tensor ambiguity error in mistral.py by @yurekami in #3790
- fix: add support for init_lora_weights="corda" in get_peft_model by @majiayu000 in #3794
- Fix correctness bugs in rl.py, rl_replacements.py, and vision.py by @danielhanchen in #3811
- Fix correctness bugs across multiple model files by @danielhanchen in #3813
- Fix 3D tensor support for bitsandbytes 8-bit matmul in forward pass by @Fizza-Mukhtar in #3806
- FIX: weight tying for LoRA embeddings and lm_head by @oKatanaaa in #3711
- Fix Gemma3 QAT training instability with int8-int4 scheme by @danielhanchen in #3818
- Add helpful error messages for fast_generate when fast_inference=False by @danielhanchen in #3820
- Bug fixes by @danielhanchen in #3821
- Make llama.cpp CURL dependency optional when building from source by @Fizza-Mukhtar in #3822
- remove redundant code of has_block by @ykaitao in #3832
- rl.py fixes: buffer reset, safer attribute access, typo fix by @danielhanchen in #3834
- Respect user quantization_config by @danielhanchen in #3835
- Fix vLLM PDL bug on Blackwell GPUs (B200/B100) by @danielhanchen in #3841
- Sync chat_template from tokenizer to vLLM by @danielhanchen in #3842
- remove unused variable BlockDiagonalCausalMask by @ykaitao in #3836
- Replace GitHub API check with vLLM version check for PDL fix by @danielhanchen in #3849
- GRPO: restore model mode after generate (stacked on #3754) by @danielhanchen in #3851
- Fix model training state restoration in GRPO trainer by @numb3r33 in #3754
- Unify Version usage and fix TRL version handling by @danielhanchen in #3843
- [ModelScope] Disable stats when modelscope is being used by @Datta0 in #3857
- Fix FBGEMM/CUTLASS errors on SM100 (Blackwell) GPUs by @danielhanchen in #3863
- Feature/raw text dataprep by @Vangmay in #3612
- Fix Kaggle telemetry misclassification when COLAB_ keys exist by @hnxnq7 in #3869
- reduce code duplication by _offload_frozen_module_for_training by @ykaitao in #3865
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #3881
- wrong number of dimensions by @f14-bertolotti in #3880
- Disable gradient checkpointing when explicitly off for vision by @ducviet00 in #3879
- [trl] use non lora model as base for RL by @Datta0 in #3895
- Chunk Across Batch and Context length for logprob calculations for grpo by @pluesclues in #3628
- add weight-only int8 QAT scheme and update tests for torchao 0.15.0 by @electroglyph in #3859
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #3905
- Fix vllm ipykernel patch by @pluesclues in #3907
- Handle Transformers 5 vLLM import errors by @danielhanchen in #3908
- add FastSentenceTransformer for easily finetuning SentenceTransformer models by @electroglyph in #3719
- Guard torch.compile on ROCm when triton_key is missing by @hnxnq7 in #3923
- Grpo compile settings update by @pluesclues in #3927
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #3937
- chore: Update outdated GitHub Actions version by @pgoslatara in #3936
- [trl] vllm trl topk fixup by @Datta0 in #3935
- [fix] qwen3-guard tokenizer by @Datta0 in #3959
- fix for intel devices torch compile configs by @l...
December Release + 3x Faster Training
Thanks for all the love and support this year! We're wishing you all a lovely Christmas. Please update Unsloth & our Docker to use the latest updates! 🦥

- Introducing 3x faster training & 30% less VRAM. New Triton kernels, padding-free & packing. Blog
- 500K Context training and reinforcement learning is now possible on a single 80GB GPU. Blog • Notebook
- Fine-tune then Deploy LLMs on your Phone with PyTorch and Unsloth. Tweet • Read Guide
- 🤗 Transformers v5 is now supported! It's not enabled by default due to possible instability issues.
- Preliminary multi-GPU support: DDP Guide (not representative of the official release early next year)
- More: Sudoku RL nb • Paddle-OCR nb • New NVIDIA blog
- Lots of bug fixes! See further below.
🔮 New Models + Guides
- ✨FunctionGemma: Google new 270M tool-calling LLM. Guide • Notebook
- Nemotron 3: NVIDIA new 30B reasoning model. Guide • GGUF
- Mistral: new coding & instruct VLMs. Ministral 3 • Devstral 2
- GLM-4.6V: new vision models. Guide • 4.6V • 4.6V-Flash
- More: Qwen3-Next • Mistral Large 3 • FLUX.2-dev
Tip
Update Unsloth via pip install --upgrade --force-reinstall --no-cache-dir --no-deps unsloth unsloth_zoo
If you want PyTorch 2.9: pip install --upgrade unsloth unsloth_zoo
Bug Fixes and Enhancements
- Supports
rollout_funcallowing multi turn RL to work - Supports
vllm>=0.12.0and efficient GRPO for it - Supports
transformers>=5.0.0, first shown via our Ministral notebooks - Fix HuggingFace token logins not working for private repos
- Fixes TorchAO and QAT not working during saving
- Fixed DeepSeek OCR finetuning not loading finetuned models
- Improved vision utilities for vision VLM finetuning
What's Changed
- Fix llama tokenizer padding_side when using model.generate in inference mode by @dmsuehir in #3644
- Fix indefinite article usage in comments and docstrings by @mk0walsk in #3648
- fix rope_theta -> rope_parameters['rope_theta'] by @mmathew23 in #3651
- Fix broken link for advanced pip installation in README by @gitpullpull in #3652
- Fix: prevent load_in_fp8 kwarg from reaching Qwen3MoeForCausalLM constructor (Fix #3649) by @bhuvanprakash in #3654
- make unsloth_tiled_mlp a from_pretrained arg by @mmathew23 in #3655
- FIX set defualt [128, 128] insted of none by @ved1beta in #3658
- Fix: Pass gradient_checkpointing parameter to model.for_training() by @sbhavani in #3659
- [FIX] Vllm guided decoding params by @Datta0 in #3662
- Vllm guided decoding by @Datta0 in #3663
- Nightly by @danielhanchen in #3664
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #3666
- Update transformers version constraint in pyproject.toml by @noah1510 in #3689
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #3694
- Remove reload_weights rpc call from grpo trainer by @Datta0 in #3673
- [Fix] [TRL] load_lora for multi line llm.chat/generate by @Datta0 in #3696
- Nightly by @danielhanchen in #3698
- SFT sample packing by @djsaunde in #3566
- Auto-enable padding-free SFT by @djsaunde in #3672
- [FIX] fbgemm version check by @Datta0 in #3704
- Nightly by @danielhanchen in #3706
- update TRL filter by @djsaunde in #3707
- [intel] skip xpu fbgemm fp8 by @leizhenyuan in #3625
- Mistral packing, train on completions only, simplifications by @djsaunde in #3709
- Update torchao save by @metascroy in #3679
- Nightly by @danielhanchen in #3720
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #3731
- Bug fixes by @danielhanchen in #3734
- Update FUNDING.yml by @danielhanchen in #3736
- Nightly by @danielhanchen in #3737
- Fix Deepseek OCR Lora Model Load by @mmathew23 in #3738
Unsloth Zoo Changes
- updates for vLLM compativility with lora by @danielhanchen in unslothai/unsloth-zoo#359
- Nightly by @danielhanchen in unslothai/unsloth-zoo#355
- Add logging to tiled mlp and fix target chunk size calculation by @mmathew23 in unslothai/unsloth-zoo#361
- Remove include_buffers from init_empty_weights by @pluesclues in unslothai/unsloth-zoo#363
- packed seq lengths token count correction by @djsaunde in unslothai/unsloth-zoo#348
- Configure ce target gb by @mmathew23 in unslothai/unsloth-zoo#365
- [FIX] vLLM LoRA extra vocab by @Datta0 in unslothai/unsloth-zoo#367
- Nightly by @danielhanchen in unslothai/unsloth-zoo#368
- [FIX] vLLM local lora tensor loading by @Datta0 in unslothai/unsloth-zoo#370
- vllm lora_dir rename and make embedding padding optional by @danielhanchen in unslothai/unsloth-zoo#373
- Bug fixes by @danielhanchen in unslothai/unsloth-zoo#375
- Update e to error by @ChetanKrishna07 in unslothai/unsloth-zoo#374
- Vision utils decode image improvement by @mmathew23 in unslothai/unsloth-zoo#372
- [FIX] [DDP] Fix compile for distributed training by @Datta0 in unslothai/unsloth-zoo#379
- Nightly by @danielhanchen in unslothai/unsloth-zoo#382
- update compiler for XLMRobertaModel by @electroglyph in unslothai/unsloth-zoo#383
- Fix Deepseek OCR Lora Model Load by @mmathew23 in unslothai/unsloth-zoo#386
- fix for non-generation models in transformers 5 by @electroglyph in unslothai/unsloth-zoo#388
New Contributors
- @dmsuehir made their first contribution in #3644
- @gitpullpull made their first contribution in #3652
- @bhuvanprakash made their first contribution in #3654
- @ved1beta made their first contribution in #3658
- @sbhavani made their first contribution in #3659
- @noah1510 made their first contribution in #3689
- @ChetanKrishna07 made their first contribution in unslothai/unsloth-zoo#374
- @electroglyph made their first contribution in unslothai/unsloth-zoo#383
Full Changelog: November-2025...December-2025
November Release + FP8 Training!
We’re getting close to our final release of 2025! Thanks so much for sticking with us this year. We’ve got lots of new features so please update Unsloth & our Docker to use the latest updates! 🦥

- Introducing FP8 Reinforcement Learning in Unsloth! Train on any FP8 supported GPU and get 1.4x faster with 60% less VRAM: Read our Blog/Guide • Notebooks: Qwen3-8B FP8 GRPO and Llama-3.2-1B FP8 GRPO
- You may notice Unsloth now uses much less VRAM than before, enabling even longer context. We’re also implementing faster training very soon and we’ll share all the details in an upcoming blog.
- DeepSeek-OCR fine-tuning is here! We fine-tuned DeepSeek-OCR, improving its language understanding by 89%. Read our Blog • Free notebook
- Qwen3-VL models supported including GGUFs to run locally: Blogpost + fixes • GGUFs
- We analyzed RL training-inference mismatch for FP16 vs. BF16 and concluded that Unsloth does not have this issue: Analysis and Results
- We’ve partnered with Docker to let you run LLMs locally with zero setup. Docker GGUFs are now powered by Unsloth Dynamic.
Example:docker model run hf.co/unsloth/gpt-oss-20b-GGUF:F16Read guide - Baidu ERNIE models are now supported. Notebooks coming soon.
- Unsloth now supports SGLang. Read our guide
- We wrote guides for LoRA Hot Swapping and vLLM Engine Arguments
- Run Kimi-K2-Thinking the most powerful open model locally. Kimi-K2 Guide
- Lots of bug fixes! See further below.
Tip
Update Unsloth via pip install --upgrade --force-reinstall --no-cache-dir --no-deps unsloth unsloth_zoo
If you want PyTorch 2.9: pip install --upgrade unsloth unsloth_zoo
Bug Fixes and Enhancements
- Supports
trl>=0.25.0andvllm>=0.11.2andtransformers>=4.57.1 - Fixed gpt-oss GRPO, RL excessive re-compilations on
torch>=2.9.0 - Fixes Sleep mode and reduces memory usage by 5 to 15% further for RL, GRPO
- Fix propagation of
trust_remote_code = True - Fix Unsloth offloaded gradient checkpointing not offloading on 1st step - reduces VRAM by >20%
- Add
logits.detach()to GRPO to solve double backwards on some pathways - Add
int64kernels & fixed RoPE embeddings to allow super ultra long context training - Fixed 📓 OpenEnv gpt-oss RL notebook
- DGX Spark docker image fixed
What's Changed
- Grpo gradient accumulation edits by @pluesclues in #3390
- Nightly by @danielhanchen in #3532
- Handle TRL version compatibility in rl_replacements.py by @pluesclues in #3540
- Bug fixes by @danielhanchen in #3546
- Sleep trl patch by @Datta0 in #3517
- Detach logits before returning from function by @pluesclues in #3554
- Fix typos in comment by @mk0walsk in #3557
- Formatting & bug fixes by @danielhanchen in #3563
- DeepseekOCR: add trust_remote_code kwarg by @mmathew23 in #3564
- pre-commit CI config by @djsaunde in #3565
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #3576
- Resize rope embeddings for long sequence training by @mmathew23 in #3586
- Patch in tiled mlp by @mmathew23 in #3584
- Support for out-of-source quantizers by @Giuseppe5 in #3534
- Fix: prevent rope_embedding AssertionError by checking kv_seq_len before reuse by @jarrycyx in #3578
- Extend TorchAOConfig to support mobile usecases by @metascroy in #3587
- fix qwen3 vl gradient accumulation by @mmathew23 in #3598
- Do not force set beta to 0 for DAPO by @Datta0 in #3604
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #3606
- Fix broken links and typo in README by @mk0walsk in #3611
- remove pre-commit workflow (covered by pre-commit app) by @djsaunde in #3618
- Add an int64 path for mlp kernels by @mmathew23 in #3614
- Remove grpo requirement bs=num_generations by @mmathew23 in #3609
- Enable FP8 + RL training for bf16 models by @andrewor14 in #3440
- Fix/save torchao model loading logic by @rolandtannous in #3621
- Fix LlamaModel_fast_forward signature to match HF Transformers (Support inputs_embeds) by @MercuryYen in #3623
- Add 128x128 PerBlock FP8 + RL by @andrewor14 in #3629
- Add trust_remote_code parameter to tokenizer by @Etherll in #3631
- [intel] change windows to remove windows-triton for intel xpu by @leizhenyuan in #3168
Unsloth Zoo Changes
- Bug fixes by @danielhanchen in unslothai/unsloth-zoo#327
- Fix GRPO by @danielhanchen in unslothai/unsloth-zoo#328
- fix gpt oss memory calculation for intel device by @leizhenyuan in unslothai/unsloth-zoo#330
- Bug fixes by @danielhanchen in unslothai/unsloth-zoo#331
- Bug fixes by @danielhanchen in unslothai/unsloth-zoo#332
- fixed unbound local error tokenizer-model from cache by @rolandtannous in unslothai/unsloth-zoo#333
- Now it works on a uv venv by @kittawere in unslothai/unsloth-zoo#336
- Gemma3n fix by @mmathew23 in unslothai/unsloth-zoo#338
- [Intel] remove triton windows for intel by @leizhenyuan in unslothai/unsloth-zoo#243
- FP8 training enhancements by @Datta0 in unslothai/unsloth-zoo#337
- GRPO gradient accumulation steps update and DAPO support by @pluesclues in unslothai/unsloth-zoo#308
- Fix/video collate by @mmathew23 in unslothai/unsloth-zoo#342
- Bug fixes by @danielhanchen in unslothai/unsloth-zoo#344
- FP8, Standby and vLLM updates by @Datta0 in unslothai/unsloth-zoo#340
- Put importance sampling into no grad by @pluesclues in unslothai/unsloth-zoo#343
- Detach hidden states to avoid gradient carry by @pluesclues in unslothai/unsloth-zoo#345
- Bug fixes by @danielhanchen in unslothai/unsloth-zoo#347
- MoE: Cast routing_weights dtype correctly by @mmathew23 in unslothai/unsloth-zoo#349
- return local model in determine_base_model_source with any quantization by @noah1510 in unslothai/unsloth-zoo#334
- Enable FP8 + RL training by @andrewor14 in unslothai/unsloth-zoo#351
- Tiled MLP Implementation by @mmathew23 in unslothai/unsloth-zoo#350
- Fix gradient checkpointing layer caller kwargs by @mmathew23 in unslothai/unsloth-zoo#353
- vLLM weight scale FP8 and standby override by @Datta0 in unslothai/unsloth-zoo#354
- Fix docstring removing regex to support empty parentheses by @noisycat3 in unslothai/unsloth-zoo#360
Unsloth Notebooks Changes
- Feat/qwen3 vl by @Erland366 in unslothai/notebooks#119
- Feat/double footer fix by @Erland366 in unslothai/notebooks#121
- Add GGUF section for Qwen3-VL by @Etherll in unslothai/notebooks#123
- Fix TypeError in unsloth_push_to_hub_gguf() when pushing GGUF model to Hugging Face by @samanta-sc in unslothai/notebooks#125
- fix TorchAOConfig' object has no attribute 'base_config' ...
October Release + Unsloth Docker!
Hey everyone, please update Unsloth to use the latest updates! 🦥
- Unsloth now has its own 🐋 Docker image! Start training with no setup: Read our Guide • Docker image
- We collabed with NVIDIA for Blackwell and DGX Spark support. Read our Blackwell guide and DGX guide.

New model updates
- Qwen3-VL models are all now supported: Blogpost • SFT 8B notebook • GRPO 8B notebook
- IBM Granite-4.0 models are now supported. Granite-4.0 guide • Notebook
- OpenAI showcased our new gpt-oss RL notebook for autonomously solving the 2048 game. Blogpost • Notebook
- Read about our GLM-4.6 chat template fixes and how to run the model here
New features
- Introducing Quantization-Aware Training: We collabed with Pytorch for QAT, recovering as much 70% accuracy. Read blog

- Unsloth supports OpenEnv to allow for open RL environments. Blog coming soon • Notebook
- New customer support agent notebook to enable real-time analysis & solving of customer interactions. You'll also learn how to train models using data from Google Sheets.
- Support for Python 3.13, PyTorch 2.9 and the latest Hugging Face TRL and transformers are now fixed.
- Save to TorchAO supported as well:
from torchao.quantization import Int4WeightOnlyConfig
model.save_pretrained_torchao("model", tokenizer, torchao_config = Int4WeightOnlyConfig())Tip
Update Unsloth via pip install --upgrade --force-reinstall --no-cache-dir --no-deps unsloth unsloth_zoo
If you want PyTorch 2.9: pip install --upgrade unsloth unsloth_zoo
RL Improvements
- Fixed Standby consuming more VRAM than usual. Auto selects the maximum 80% to 95% of GPU utilization if
import os; os.environ["UNSLOTH_VLLM_STANDBY"] = "1"is used. - Fixed GRPO training hangs with better environment timers - works on DGX Spark and all other GPUs.
- Fixes GRPO
RuntimeError: shape '[1, 887, 1, 128]' is invalid for input of size 3633152for all models
RL Environment functions
- New
execute_with_time_limitfunction to force functions to execute within a time limit. E.g. with a 2 second time limit, use:
from unsloth import execute_with_time_limit
@execute_with_time_limit(2)
def execute_strategy(strategy, game):
return _execute_strategy(strategy, game)
try:
execute_strategy(strategy, game)
except TimeoutError as e:
print(f"Timed out with error = {str(e)}")- To check if only Python standard modules are used in a function, use
check_python_modules. - Use
create_locked_down_functionto create a function without leakage of global variables. - Use
Benchmarkeriefrom unsloth import Benchmarkerto benchmark functions accurately. It wipes the L1 to L3 cache approximately to reduce chances of benchmark cheating. - Use
launch_openenvto launch a continuous reloaded OpenEnv environment process (to stop it from closing down) iefrom unsloth import launch_openenvIt will auto find a port that is not used.
Bug fixes
- GPT-OSS BF16 The GPTOSSRouter works with
load_in_4bit = TrueAttributeError: 'GptOssTopKRouter' object has no attribute 'weight' - Mistral training fixed - sentencepiece proto issue fixed (any protobuf version works)
- Fix evaluation ie
UNSLOTH_RETURN_LOGITS="1"works. Fixes #3126 #3071 - Fixes
Output 0 of UnslothFusedLossBackward is a view and is being modified inplace.for Gemma 3 andtransformers>=4.57.1 - If you see
ImportError: cannot import name '_Ink' from 'PIL._typing' (/usr/local/lib/python3.12/dist-packages/PIL/_typing.py)please update and use our new notebooks
Don't forget to also join our Reddit: r/unsloth 🥰
What's Changed
- Fix loading as 8bit by @Etherll in #3384
- Nightly by @danielhanchen in #3392
- Nightly by @danielhanchen in #3394
- Update int8-int4 QAT config to use Int8DynamicActivationIntxWeightConfig by @metascroy in #3391
- Gemma 3 bug fixes by @danielhanchen in #3410
- Transformers Fix v4.57 rename from PretrainedConfig to PreTrainedConfig by @mmathew23 in #3445
- improve qat by @Etherll in #3446
- Fix eval metric issue by @pluesclues in #3420
- [Part2] Reinstate llama.cpp Compatibility and GGUF Conversion with Multiple Quantizations and Automated Ollama Modelfile Creation by @rolandtannous in #3356
- vLLM FP8 quantized support for SFT/GRPO by @Datta0 in #3414
- Fix by @danielhanchen in #3466
- AMD fixes by @danielhanchen in #3467
- Fix transformers 4.57.1 by @danielhanchen in #3473
- GRPO bug fixes by @danielhanchen in #3474
- EOL LF (unix line endings) normalization by @djsaunde in #3478
- Fix out of resources issue for llama3.2 sft on amd gpu by @wangxunx in #3455
- Bug fixes by @danielhanchen in #3483
- Bug fixes by @danielhanchen in #3484
- Patch sleep mode properly for trl by @Datta0 in #3492
- Sleep trl patch by @Datta0 in #3494
- fix cross entropy loss issue for small vocab size on amd gpu by @wangxunx in #3503
- Gemma 3n fix by @mmathew23 in #3499
- enable intel for torch2.8 by @leizhenyuan in #3381
- add code for intel qlora by @leizhenyuan in #3370
- fix for intel memory calculation by @leizhenyuan in #3513
- [intel] enable support 2.9 for intel xpu by @leizhenyuan in #3514
- FP8 training enhancements by @Datta0 in #3496
New Contributors
- @metascroy made their first contribution in #3391
- @djsaunde made their first contribution in #3478
- @wangxunx made their first contribution in #3455
Full Changelog: September-2025-v3...October-2025
gpt-oss Reinforcement Learning + Auto Kernel Notebook
We’re introducing gpt-oss RL support and the fastest RL inference and lowest VRAM use vs. any implementation. Blog: https://docs.unsloth.ai/new/gpt-oss-reinforcement-learning
- Unsloth now offers the fastest inference (~3x faster), lowest VRAM (50% less) and most context (8x longer) for gpt-oss RL vs. any implementation - with no accuracy loss.
- Since RL on gpt-oss isn't yet vLLM compatible, we rewrote Transformers inference code to enable faster inference
- gpt-oss-20b GSPO free Colab notebook
- This notebook automatically creates faster matrix multiplication kernels and uses a new Unsloth reward function. We also show how to counteract reward-hacking which is one of RL's biggest challenges.
- We previously released Vision RL with GSPO support
⚠️ Reminder to NOT use Flash Attention 3 for gpt-oss as it'll make your training loss wrong.- DeepSeek-V3.1-Terminus is here and you can run locally via our GGUF
Read how our 3-bit GGUF beats Claude-4-Opus (thinking) on Aider Polyglot here - Magistral 1.2 is here and you can run it locally here or fine-tune it for free by using our Kaggle notebook
- Fine-tuning the new Qwen3 models including Qwen3-VL, Qwen3-Omni and Qwen3-Next should work in Unsloth if you install the latest transformers. The models are big however so ensure you have enough VRAM.
- BERT is now fixed! Feel free to use our BERT fine-tuning notebook
Don't forget to also join our Reddit: r/unsloth 🥰
What's Changed
- Bug fixes by @danielhanchen in #3329
- Fix QAT + LoRA fast path, add tests by @andrewor14 in #3307
- Use gemma3n embedder patch + adjust FORCE_FLOAT32 match logic by @mmathew23 in #3332
- Synthetic Data updates by @mmathew23 in #3333
- Fix loading issues for BERT by @Etherll in #3339
- Bug fixes by @danielhanchen in #3335
- peft_config before model_config by @mmathew23 in #3342
- specify different tokenizer_path/name by @mmathew23 in #3343
- correct python support statement by @laz-001 in #3374
- GPT OSS RL by @danielhanchen in #3362
New Contributors
Full Changelog: September-2025-v2...September-2025-v3