v0.6.0: New Attention Operators, Cosine Similarity Loss, Llama 4, and VLM Patching Updates
Highlights
This release introduces significant improvements to Liger-Kernel, including new operators, support for Llama 4 models, more robust benchmarking automation, and key fixes for patching of vision-language models (VLMs) due to recent transformers refactoring.
Key Changes
New Features & Improvements
- Multi-Token Attention by @AndreSlavescu (#689)
- Fused Neighborhood Attention by @AndreSlavescu (#732)
- Cosine Similarity Loss for Distillation by @Dexterai (#780)
- Support for Llama 4 by @Manan17 (#740)
- Option to choose fused LCE/CE loss by @connermanuel (#704)
- Add block_rms_norm for QK norm by @mdy666 (#731)
Bug Fixes
- Vision-language model patching in recent transformers versions (>=4.52.0):
- RMS Norm patching by @vaibhavjindal, @BenasdTW (#741, #765)
- Hugging Face forward kwargs fix by @llllvvuu (#708)
- Fix import tanh by @jue-jue-zi (#762)
- Apply monkey patch to instances by @YangKai0616 (#772)
Documentation & CI Fixes
- Deploy MkDocs to GitHub Pages by @ParagEkbote (#724)
- Robust doc updates by @ParagEkbote (#726, #727)
- .idea ignored by @Tcc0403 (#784)
- ReadMe, MTA + softmax docs by @AndreSlavescu (#730)
- Relax DyT tol, XPU skip MTA by @Tcc0403 (#778)
- Paligemma test fixes by @vvvdwbvvv (#785)
- Style & test fixes by @Tcc0403, @vaibhavjindal (#736, #794)
- Add torchvision for multimodal test by @Tcc0403 (#755)
Benchmarking & Automation
- Automated benchmarking and visualization UI in GitHub pages by @Manan17 (#744, #747, #749, #752, #753, #756, #759, #760, #770, #779)
New Contributors
- @connermanuel made their first contribution in #704
- @llllvvuu made their first contribution in #708
- @jue-jue-zi made their first contribution in #762
- @YangKai0616 made their first contribution in #772
- @Dexterai made their first contribution in #780
- @vvvdwbvvv made their first contribution in #785
Full Changelog: v0.5.10...v0.6.0