Skip to content

Releases: unslothai/unsloth

New Important Updates!

27 Mar 15:09
9477e7c

Choose a tag to compare

Hey guys, it's only been 2 days since our last release, but we’ve got a lot more important updates:

  • Inference is now 20–30% faster. Previously, tool-calling and repeat penalty could slow inference below normal speeds. Inference tokens/s should now perform similar to llama-server / llama.cpp.
  • Now Auto-detects older or pre-existing models downloaded from LM Studio, Hugging Face, and similar sources.
  • Inference token/s speed is now calculated correctly. Previously, tokens/s included startup time, which made the displayed speed look slower than it actually was. It should now reflect 'true' inference speed.
  • CPU usage no longer spikes. Previously, inline querier identity changed every render, causing useLiveQuery to resubscribe continuously.
  • Unsloth Studio now has a shutdown x button and shuts down properly. Previously, closing it after opening from the desktop icon would not close it properly. Now, launching from the shortcut also opens the terminal, and closing that terminal fully exits Unsloth Studio. If you still have it open from a previous session you can restart your computer or run lsof -i :8888 then kill -9 <PID>.
  • Even better tool-calling and websearch with reduced errors.
  • Updated documentation with lots of new info on deleting models, uninstalling etc.
  • Cleaner, smarter install and setup logging across Windows and Linux. Output is now easier to read with consistent formatting, quieter by default for a smoother experience, and supports richer --verbose diagnostics when you want full technical detail.
    {% endupdate %}
  • You can now view your training history

What's Changed

New Contributors

Full Changelog: v0.1.2-beta...v0.1.25-beta

First Release post Unsloth Studio!

25 Mar 15:39
55d24d7

Choose a tag to compare

Hey guys, this is our first release since we launched Unsloth Studio last week. From now on you can directly access all our updates through our changelog here: https://unsloth.ai/docs/new/changelog

You can now update Unsloth Studio! Just use: unsloth studio update. Please update to use all the newest fixes and features.

  • Tool calling improved. Better llama.cpp parsing, no raw tool markup in chat, faster inference, a new Tool Outputs panel, timers.
  • Windows CPU or GPU now works seamlessly. Please reinstall!
  • App shortcuts. Once installed, you can now launch in Windows, MacOS and Linux via a shortcut icon in the Start / Launch and Desktop.
  • Pre-compiled llama.cpp binaries and mamba_ssm for finetuning - 6x faster installs! Also <300MB in size for binaries.
  • 50% reduced installation sizes (-7GB or more savings), 2x faster installs and faster resolving. 50% smaller pypi sizes.
  • Colab with free T4 GPUs with Unsloth Studio now fixed! Try it here. Due to pre-compiled binaries, it's also 20x faster!
  • You can now properly use old GGUFs from Hugging Face or LM Studio
  • MacOS and CPU now have Data Recipes enabled with multi-file uploading.
  • AMD support preliminary for Linux only machines - auto detects.
  • Settings sidebar redesign. Settings are now grouped into Model, Sampling, Tools, and Preferences
  • Context length now adjustable. Keep in mind this is not needed as llama.cpp smartly uses the exact context you need via --fit on
  • Persistent system prompts and presets. Custom system prompts and chat presets now persist across reloads and page changes.
  • Multi-file upload. Data recipes now support multiple drag-and-drop uploads for PDF, DOCX, TXT, and MD, with backend extraction, saved uploads, and improved previews.
  • Better chat observability. Studio now shows llama-server timings and usage, a context-window usage bar, and richer source hover cards.
  • Better UX overall - clickable links, better LaTeX parsing, tool / code / web tooltips for default cards and much more!
  • LiteLLM - Unsloth Studio and Unsloth were NOT affected by the recent LiteLLM compromise. Nemo Data Designer used LiteLLM only up to 1.80, not the affected 1.82.7 or 1.82.8, and has since removed it entirely.
  • We now have a new one line install command, just run: Copycurl -fsSL https://unsloth.ai/install.sh | sh

Fixes:

  • Windows/setup improvements. Fixed silent Windows exits, Anaconda/conda-forge startup crashes, broken non-NVIDIA Windows installs, and missing early CUDA/stale-venv setup checks.
  • System prompts fixed. They work again for non-GGUF text and vision inference.
  • GGUF export expanded. Full fine-tunes, not just LoRA/PEFT, can now export to GGUF. Base model resolution is more reliable, and unsupported export options are disabled in the UI.
  • Chat scroll/layout fixes. Fixed scroll-position issues during generation, thinking-panel layout shift, and viewport jumps when collapsing reasoning panels.
  • Smarter port conflict detection. Studio now detects loopback conflicts, can identify the blocking process when possible, and gives clearer fallback-port messages.

Example of automatic parameter settings for context length etc:

super.final.mp4

What's Changed

New Contributors

Read more

llama.cpp prebuilt b8475

22 Mar 20:34

Choose a tag to compare

Install-ready Unsloth Studio llama.cpp bundles for b8475.

Introducing Unsloth Studio (Beta)!

17 Mar 15:21
239ca98

Choose a tag to compare

Hey guys, we're super excited to launch Unsloth Studio (Beta), a new open-source web UI to train and run LLMs.

Blog + everything you need to know: https://unsloth.ai/docs/new/studio

  • Run models locally on Mac, Windows, Linux
  • Compare and battle models side-by-side
  • Train 500+ models 2x faster with 70% less VRAM
  • Supports GGUF, vision, audio, embedding models
  • Self-healing Tool calling / web search + code execution
  • Auto-create datasets from PDF, CSV, DOCX
  • Export models to GGUF, safetensor and more formats

MacOS, Linux, WSL:

For MacOS, ensure you have cmake installed. If not, run brew install cmake.

curl -fsSL https://unsloth.ai/install.sh | sh

Then to launch every time:

source unsloth_studio/bin/activate
unsloth studio -H 0.0.0.0 -p 8888

Windows:

Run in Windows Powershell:

irm https://unsloth.ai/install.ps1 | iex

Then to launch every time:

.\unsloth_studio\Scripts\activate
unsloth studio -H 0.0.0.0 -p 8888

Docker

Use our Docker image unsloth/unsloth container. Run:

docker run -d -e JUPYTER_PASSWORD="mypassword" \
  -p 8888:8888 -p 8000:8000 -p 2222:22 \
  -v $(pwd)/work:/workspace/work \
  --gpus all \
  unsloth/unsloth
unsloth.studio.video.mp4

What's Changed

New Contributors

Full Changelog: https://github.com/unslothai/unsloth/commits/March-2026

llama.cpp prebuilt b8457

20 Mar 21:49
dd283b0

Choose a tag to compare

Install-ready Unsloth Studio llama.cpp bundles for b8457.

12x Faster MoE Training + Embedding support!

10 Feb 15:25

Choose a tag to compare

Our first release of 2026! This year we’ve got a lot of exciting things coming and to kick things off, we’re introducing faster MoE training, embedding model support, and ultra long context for Reinforcement Learning. We’ll also be launching our brand new UI very soon.

We’d like to thank all of you for 50K stars on GitHub! ⭐

february release

We’ve also added support for many new models that you can now run and fine-tune locally, including DeepSeek-OCR 2, GLM-4.7-Flash, Kimi-2.5, and more.

🚀 Faster MoE training

You can now train MoE models 12× faster with 35% less VRAM and 6x longer context via our new Triton and math kernels (no accuracy loss). gpt-oss-20b works on 12.8GB VRAM. Qwen3-30B-A3B (16-bit LoRA) uses 63GB.

Unsloth supports fast training for gpt-oss, Qwen3 (30B, 235B, VL, Coder), DeepSeek R1/V3 arch and GLM (4.7, Flash) models.

Faster MoE Blog

🔎 Embedding models now train 2× faster

We collaborated with Hugging Face to enable 1.8-3.3x faster embedding, BERT and classifier model training with 20% less VRAM, 2x longer context & no accuracy loss vs. FA2 setups.

Embedding model Blog

💡 Ultra Long Context RL is here

We’re introducing new batching algorithms to enable ~7x longer context (can be more than 12x) RL training with no accuracy or speed degradation vs. other optimized setups that use FA3, kernels & chunked losses.

Unsloth now trains gpt-oss QLoRA with 380K context on a single 192GB NVIDIA B200 GPU

Long Context RL Blog

🔮 New models

🎉 Extra Updates

  1. As part of our MoE release, we also made Gemma-3 now use Flex-Attention by default, and this works in float16 settings as well (there were infinities which we solved a while back). Gemma-3 now uses O(N) memory and not O(N^2) memory, and trains >3x faster (scales even better with context length). Previous Unsloth versions would OOM.
  2. Vision fine-tuning now accepts mixed data of only images and text data!
  3. trl==0.27.1 and transformers==5.1.0 are supported well - previous coverage was 30% of all our 120 notebooks, but now we have >80% coverage - we plan to make it 100% over the next few days.
  4. And many many other bug fixes and other updates!

📖 New Guides

  • </> How To Use Claude Code + Codex with local LLMs: Guide
  • 👾 Train & deploy to LM Studio for local inference: Guide
  • 🎨 Run Diffusion image models with Unsloth GGUFs: Guide

Tip

Update Unsloth via pip install --upgrade --force-reinstall --no-cache-dir --no-deps unsloth unsloth_zoo
If you want PyTorch 2.9: pip install --upgrade unsloth unsloth_zoo

February is shaping up to be an amazing month for LLM releases, and we hope you’re just as excited as we are. 😊

What's Changed

Read more

December Release + 3x Faster Training

18 Dec 17:45

Choose a tag to compare

Thanks for all the love and support this year! We're wishing you all a lovely Christmas. Please update Unsloth & our Docker to use the latest updates! 🦥

Unsloth December Release

  • Introducing 3x faster training & 30% less VRAM. New Triton kernels, padding-free & packing. Blog
  • 500K Context training and reinforcement learning is now possible on a single 80GB GPU. BlogNotebook
  • Fine-tune then Deploy LLMs on your Phone with PyTorch and Unsloth. TweetRead Guide
  • 🤗 Transformers v5 is now supported! It's not enabled by default due to possible instability issues.
  • Preliminary multi-GPU support: DDP Guide (not representative of the official release early next year)
  • More: Sudoku RL nbPaddle-OCR nbNew NVIDIA blog
  • Lots of bug fixes! See further below.

🔮 New Models + Guides

Tip

Update Unsloth via pip install --upgrade --force-reinstall --no-cache-dir --no-deps unsloth unsloth_zoo
If you want PyTorch 2.9: pip install --upgrade unsloth unsloth_zoo

Bug Fixes and Enhancements

  1. Supports rollout_func allowing multi turn RL to work
  2. Supports vllm>=0.12.0 and efficient GRPO for it
  3. Supports transformers>=5.0.0, first shown via our Ministral notebooks
  4. Fix HuggingFace token logins not working for private repos
  5. Fixes TorchAO and QAT not working during saving
  6. Fixed DeepSeek OCR finetuning not loading finetuned models
  7. Improved vision utilities for vision VLM finetuning

What's Changed

Unsloth Zoo Changes

New Contributors

Full Changelog: November-2025...December-2025

November Release + FP8 Training!

25 Nov 16:24

Choose a tag to compare

We’re getting close to our final release of 2025! Thanks so much for sticking with us this year. We’ve got lots of new features so please update Unsloth & our Docker to use the latest updates! 🦥

Unsloth November Release

output(14)
  • You may notice Unsloth now uses much less VRAM than before, enabling even longer context. We’re also implementing faster training very soon and we’ll share all the details in an upcoming blog.
  • DeepSeek-OCR fine-tuning is here! We fine-tuned DeepSeek-OCR, improving its language understanding by 89%. Read our BlogFree notebook
  • Qwen3-VL models supported including GGUFs to run locally: Blogpost + fixesGGUFs
  • We analyzed RL training-inference mismatch for FP16 vs. BF16 and concluded that Unsloth does not have this issue: Analysis and Results
  • We’ve partnered with Docker to let you run LLMs locally with zero setup. Docker GGUFs are now powered by Unsloth Dynamic.
    Example: docker model run hf.co/unsloth/gpt-oss-20b-GGUF:F16 Read guide
  • Baidu ERNIE models are now supported. Notebooks coming soon.
  • Unsloth now supports SGLang. Read our guide
  • We wrote guides for LoRA Hot Swapping and vLLM Engine Arguments
  • Run Kimi-K2-Thinking the most powerful open model locally. Kimi-K2 Guide
  • Lots of bug fixes! See further below.

Tip

Update Unsloth via pip install --upgrade --force-reinstall --no-cache-dir --no-deps unsloth unsloth_zoo
If you want PyTorch 2.9: pip install --upgrade unsloth unsloth_zoo

Bug Fixes and Enhancements

  1. Supports trl>=0.25.0 and vllm>=0.11.2 and transformers>=4.57.1
  2. Fixed gpt-oss GRPO, RL excessive re-compilations on torch>=2.9.0
  3. Fixes Sleep mode and reduces memory usage by 5 to 15% further for RL, GRPO
  4. Fix propagation of trust_remote_code = True
  5. Fix Unsloth offloaded gradient checkpointing not offloading on 1st step - reduces VRAM by >20%
  6. Add logits.detach() to GRPO to solve double backwards on some pathways
  7. Add int64 kernels & fixed RoPE embeddings to allow super ultra long context training
  8. Fixed 📓 OpenEnv gpt-oss RL notebook
  9. DGX Spark docker image fixed

What's Changed

Unsloth Zoo Changes

Unsloth Notebooks Changes

Read more

October Release + Unsloth Docker!

27 Oct 11:25

Choose a tag to compare

Hey everyone, please update Unsloth to use the latest updates! 🦥

New model updates

New features

  • Introducing Quantization-Aware Training: We collabed with Pytorch for QAT, recovering as much 70% accuracy. Read blog
    qat2
  • Unsloth supports OpenEnv to allow for open RL environments. Blog coming soon • Notebook
  • New customer support agent notebook to enable real-time analysis & solving of customer interactions. You'll also learn how to train models using data from Google Sheets.
  • Support for Python 3.13, PyTorch 2.9 and the latest Hugging Face TRL and transformers are now fixed.
  • Save to TorchAO supported as well:
from torchao.quantization import Int4WeightOnlyConfig
model.save_pretrained_torchao("model", tokenizer, torchao_config = Int4WeightOnlyConfig())

Tip

Update Unsloth via pip install --upgrade --force-reinstall --no-cache-dir --no-deps unsloth unsloth_zoo
If you want PyTorch 2.9: pip install --upgrade unsloth unsloth_zoo

RL Improvements

  1. Fixed Standby consuming more VRAM than usual. Auto selects the maximum 80% to 95% of GPU utilization if import os; os.environ["UNSLOTH_VLLM_STANDBY"] = "1" is used.
  2. Fixed GRPO training hangs with better environment timers - works on DGX Spark and all other GPUs.
  3. Fixes GRPO RuntimeError: shape '[1, 887, 1, 128]' is invalid for input of size 3633152 for all models

RL Environment functions

  1. New execute_with_time_limit function to force functions to execute within a time limit. E.g. with a 2 second time limit, use:
from unsloth import execute_with_time_limit
@execute_with_time_limit(2)
def execute_strategy(strategy, game):
    return _execute_strategy(strategy, game)
try:
    execute_strategy(strategy, game)
except TimeoutError as e:
    print(f"Timed out with error = {str(e)}")
  1. To check if only Python standard modules are used in a function, use check_python_modules.
  2. Use create_locked_down_function to create a function without leakage of global variables.
  3. Use Benchmarker ie from unsloth import Benchmarker to benchmark functions accurately. It wipes the L1 to L3 cache approximately to reduce chances of benchmark cheating.
  4. Use launch_openenv to launch a continuous reloaded OpenEnv environment process (to stop it from closing down) ie from unsloth import launch_openenv It will auto find a port that is not used.

Bug fixes

  1. GPT-OSS BF16 The GPTOSSRouter works with load_in_4bit = True AttributeError: 'GptOssTopKRouter' object has no attribute 'weight'
  2. Mistral training fixed - sentencepiece proto issue fixed (any protobuf version works)
  3. Fix evaluation ie UNSLOTH_RETURN_LOGITS="1" works. Fixes #3126 #3071
  4. Fixes Output 0 of UnslothFusedLossBackward is a view and is being modified inplace. for Gemma 3 and transformers>=4.57.1
  5. If you see ImportError: cannot import name '_Ink' from 'PIL._typing' (/usr/local/lib/python3.12/dist-packages/PIL/_typing.py) please update and use our new notebooks

Don't forget to also join our Reddit: r/unsloth 🥰

What's Changed

New Contributors

Full Changelog: September-2025-v3...October-2025

gpt-oss Reinforcement Learning + Auto Kernel Notebook

26 Sep 15:24

Choose a tag to compare

We’re introducing gpt-oss RL support and the fastest RL inference and lowest VRAM use vs. any implementation. Blog: https://docs.unsloth.ai/new/gpt-oss-reinforcement-learning

  • Unsloth now offers the fastest inference (~3x faster), lowest VRAM (50% less) and most context (8x longer) for gpt-oss RL vs. any implementation - with no accuracy loss.
  • Since RL on gpt-oss isn't yet vLLM compatible, we rewrote Transformers inference code to enable faster inference
  • gpt-oss-20b GSPO free Colab notebook
  • This notebook automatically creates faster matrix multiplication kernels and uses a new Unsloth reward function. We also show how to counteract reward-hacking which is one of RL's biggest challenges.
gptoss rl
  • We previously released Vision RL with GSPO support
  • ⚠️ Reminder to NOT use Flash Attention 3 for gpt-oss as it'll make your training loss wrong.
  • DeepSeek-V3.1-Terminus is here and you can run locally via our GGUF
    Read how our 3-bit GGUF beats Claude-4-Opus (thinking) on Aider Polyglot here
  • Magistral 1.2 is here and you can run it locally here or fine-tune it for free by using our Kaggle notebook
  • Fine-tuning the new Qwen3 models including Qwen3-VL, Qwen3-Omni and Qwen3-Next should work in Unsloth if you install the latest transformers. The models are big however so ensure you have enough VRAM.
  • BERT is now fixed! Feel free to use our BERT fine-tuning notebook

Don't forget to also join our Reddit: r/unsloth 🥰

What's Changed

New Contributors

Full Changelog: September-2025-v2...September-2025-v3