Releases: Blaizzy/mlx-audio
Releases · Blaizzy/mlx-audio
v0.3.1
What's Changed
- Update uv.lock to reflect dependency version changes by @Blaizzy in #432
- v0.3.1: Update STT API docs and fix default output path by @Blaizzy in #433
- Qwen3-TTS: Add streaming and optimise peak usage by @Blaizzy in #435
- Fix: Use single quotes in README examples to avoid Bash history expansion. by @reinexworldc in #440
- Fix: improve import error hadling by @reinexworldc in #443
- [Qwen3-TTS] Fix some Custom Voices producing silence with 0.6B by @Blaizzy in #444
- Refactor audio load by @Blaizzy in #445
- Update pyproject.toml for poetry support by @lucasnewman in #446
- Add Qwen3-ASR by @Blaizzy in #454
- Fix chatterbox load by @Blaizzy in #455
- Update README to remove basic usage section by @Blaizzy in #456
- Update README with output path for ASR commands by @rahimnathwani in #458
- Update package dependencies in uv.lock to include new extras by @Blaizzy in #457
- Fix server (STT, TTS) by @Blaizzy in #460
New Contributors
- @reinexworldc made their first contribution in #440
Full Changelog: v0.3.0...v0.3.1
v0.3.0
What's Changed
- Fix speaker embedding extraction in Qwen3-TTS model by @Blaizzy in #390
- Fix Qwen3-TTS tail artifacts by @Blaizzy in #391
- Fix Qwen3-TTS Base Voice Cloning by @Blaizzy in #394
- Add Vibevoice ASR by @Blaizzy in #389
- Qwen3 speaker embedding tests by @Blaizzy in #396
- Update TTS commands in README to include language code option by @rudolfolah in #401
- Unify Mimi implementation for Pocket TTS by @lucasnewman in #403
- Fix issue of ref_audio not loading prior to inference with server. by @BuffMcBigHuge in #406
- Enhance README with installation and usage examples by @rahimnathwani in #404
- Upgrade GitHub Actions for Node 24 compatibility by @salmanmkc in #418
- Upgrade GitHub Actions to latest versions by @salmanmkc in #419
- [VibeVoice-ASR] Fix Metal kernel crash and optimize memory for long audio by @Blaizzy in #417
- fix: Allowing quantization of Qwen3-TTS! Adding model_quant_predicate to Qwen3-TTS to exclude embedding layers by @kyr0 in #398
- Fix qwen3 tts quants (silence in VC and word precision) by @Blaizzy in #407
- Fix stt array io by @Blaizzy in #426
- Update MANIFEST.in to remove leading dot from requirements.txt path by @Blaizzy in #428
- Move audio path/format prints under verbose flag by @wladpaiva in #429
- Update pyproject.toml and GitHub Actions workflow for package publishing by @Blaizzy in #431
New Contributors
- @rudolfolah made their first contribution in #401
- @BuffMcBigHuge made their first contribution in #406
- @rahimnathwani made their first contribution in #404
- @salmanmkc made their first contribution in #418
- @kyr0 made their first contribution in #398
- @wladpaiva made their first contribution in #429
Full Changelog: v0.2.10...v0.3.0
v0.3.0rc1
What's Changed
- Remove extra deps by @Blaizzy in #373
- Refactor load by @Blaizzy in #374
- Add lfm2 audio by @Blaizzy in #370
- update lfm readme by @Blaizzy in #377
- Fix lang codes kokoro by @Blaizzy in #380
- Replace soundfile with miniaudio + ffmpeg by @Blaizzy in #379
- Add Pocket TTS model by @lucasnewman in #381
- Fix STT stream by @Blaizzy in #382
- Migrate swift to https://github.com/Blaizzy/mlx-audio-swift by @Blaizzy in #363
- Refactor model path retrieval in get_model_path function by @Blaizzy in #383
- Add streaming decoding to snac and orpheus by @Blaizzy in #384
- Update generate output path by @Blaizzy in #385
- Add Qwen3-TTS by @Blaizzy in #388
Full Changelog: v0.2.10...v0.3.0rc1
v0.2.10
What's Changed
- Refactor GLMASR and improve LM style ASR logging by @Blaizzy in #332
- Remove actual issue ID reference from PR template by @mootari in #334
- Add maya1 fixes to Llama by @Blaizzy in #340
- Fix marvis, chatterbox and args by @Blaizzy in #342
- Add Sam Audio by @Blaizzy in #338
- fix: Add missing mlx-lm dependency by @joshwhiton in #344
- feat(swift): add Kokoro-82M-v1.1-zh MLX Support by @Alex-Wengg in #341
- Remove loguru reconfiguration on Kokoro import by @joshwhiton in #348
- feat(stt): add AlignAtt streaming transcription for Whisper by @beshkenadze in #321
- Fix stft args by @Blaizzy in #354
- Allow using DACVAE as a codec independent of SAM Audio model by @lucasnewman in #357
- chore: update Python version requirement and dependencies by @Blaizzy in #355
- Add MossFormer2 SE (Speech Enhancement) by @starkdmi in #351
- use chatterbox MTLTokenizer for multilingual. by @litmudoc in #362
- Add streaming and refactor Sam Audio API by @Blaizzy in #360
- Add Soprano by @Blaizzy in #359
- Fix model type, refactor orpheus style models by @Blaizzy in #358
- revert default response format to mp3 by @Blaizzy in #356
- Refactor voice loading in KokoroPipeline to support .safetensors files by @Blaizzy in #364
- Add uv.lock and pin all deps as core by @Blaizzy in #366
New Contributors
- @mootari made their first contribution in #334
- @Alex-Wengg made their first contribution in #341
- @starkdmi made their first contribution in #351
- @litmudoc made their first contribution in #362
Full Changelog: v0.2.9...v0.2.10
v0.2.9
What's Changed
- Add GLM ASR by @Blaizzy in #320
- Simplify convert API for TTS and STT by @Blaizzy in #324
- [Chatterbox-turbo] Add speaker embedding by @Blaizzy in #322
- [Chatterbox-turbo] Add in-place cache by @Blaizzy in #322
- [Chatterbox-turbo] Add audio streaming by @Blaizzy in #322
- [Chatterbox-turbo] Add audio chunking by @Blaizzy in #322
Full Changelog: v0.2.6...v0.2.9
v0.2.8
What's Changed
- fix(server): use lowercase for default response_format by @beshkenadze in #301
- Add Chatterbox and Chatterbox Turbo by @Blaizzy in #302
- Add Chatterbox [VC only] by @DePasqualeOrg in #282
- feat: add lazy imports for TTS/STT modules by @beshkenadze in #290
- Pin tfms dep <5.0.0 by @Blaizzy in #303
- feat: migrate from setup.py to pyproject.toml with optional deps by @beshkenadze in #291
- fix(test): use case-insensitive content-type comparison by @beshkenadze in #300
- ci: add modular installation tests for pyproject.toml extras by @beshkenadze in #298
- Fix build by @Blaizzy in #304
Full Changelog: v0.2.7...v0.2.8
v0.2.7
What's Changed
- Refactor Marvis TTS API: Make public methods accessible by @rudrankriyam in #259
- Add Marvis model selection to TTS UI by @rudrankriyam in #261
- Add Marvis quant selection to TTS Web UI by @adrgrondin in #264
- Fix security vulnerabilities in Next.js and brace-expansion dependencies by @Copilot in #265
- make the methods more useful by @pritamsoni-hsr in #246
- Update to mlx-swift-lm and remove redundant mlx-swift dependency by @rudrankriyam in #269
- Fix voxtral segments by @Blaizzy in #273
- Add UI startup option to Server by @Blaizzy in #274
- Add preemphasis preprocessing support for Parakeet models to match NeMo training config by @joshwhiton in #286
- feat: Add support for VoxCPM (w/ voice cloning) by @voxmenthe in #293
- Fix spark decoding by @Blaizzy in #296
- feat: extract DSP utilities to dedicated module by @beshkenadze in #289
- Feat: add response format option to SpeechRequest by @Blaizzy in #297
- Add VibeVoice by @Blaizzy in #295
New Contributors
- @pritamsoni-hsr made their first contribution in #246
- @joshwhiton made their first contribution in #286
- @voxmenthe made their first contribution in #293
- @beshkenadze made their first contribution in #289
Full Changelog: v0.2.6...v0.2.7
v0.2.6
What's Changed
- fix wav2vec by @josharian in #222
- Fix RTF calculation in kokoro model by @davidxifeng in #227
- Fix Unnecessary Audio Transcription for the IndexTTS Model by @bytefer in #231
- Add Sesame TTS Integration for Swift Audio Package by @rudrankriyam in #223
- Use batched vocoding to reduce peak memory usage with Sesame arch models by @lucasnewman in #236
- Cache RoPE by dtype for Sesame arch models for improved generation performance by @lucasnewman in #232
- Install Metal toolchain for Swift tests by @lucasnewman in #233
- Adopt changes interface changes from mlx-lm to fix Sesame-arch models by @lucasnewman in #242
- Update swift-transformers dependency to 1.1.0 by @Liam1506 in #247
- Improve Swift TTS app UX by @rudrankriyam in #248
- Add quality selection and streaming controls to Marvis with UI support for macOS & iOS by @rudrankriyam in #249
- Fix Swift compiler warnings by @rudrankriyam in #250
- Refactor MarvisModel to handle optional backbone and decoder flavors by @rudrankriyam in #251
- Fix iOS 16 compatibility and ESpeakNG framework linking for iOS app by @rudrankriyam in #252
- Add memory increase limit for iOS by @rudrankriyam in #253
- Update audio playback management in Marvis TTS by @rudrankriyam in #254
- Bump version and add new copy files by @Blaizzy in #255
- Add UI v2 by @Blaizzy in #154
New Contributors
- @josharian made their first contribution in #222
- @davidxifeng made their first contribution in #227
- @bytefer made their first contribution in #231
- @Liam1506 made their first contribution in #247
Full Changelog: v0.2.5...v0.2.6
v0.2.5
What's Changed
- Use indeterminate progress for CSM models by @lucasnewman in #216
- Bump version to 0.2.5 by @Blaizzy in #219
Full Changelog: v0.2.4...v0.2.5
v0.2.4
What's Changed
- move sentence splitting into a separate utility class and add unit tests by @smdesai in #183
- Add BigVGAN neural audio codec by @senstella in #186
- fix: (outetts)loading model: Speaker file not found. by @zysam in #189
- Fix deprecated save in MLX-LM by @Blaizzy in #194
- Implementation of Misaki G2P tokenizer by @smdesai in #193
- Add IndexTTS by @senstella in #187
- Load both lexicon files us_gold and us_silver with words in us_gold taking precedence by @smdesai in #195
- Add S3 semantic tokenizer / neural audio codec by @lucasnewman in #204
- add lexicon files for British sounds, gb_gold and gb_silver by @smdesai in #197
- Fix Mimi codec by @lucasnewman in #209
- Add ability to use a custom URL to load Kokoro safetensors by @adrgrondin in #185
- Handle transformers-style config for Sesame CSM models by @lucasnewman in #211
- Add Xcode build troubleshooting documentation by @kinkadius in #210
- Multi model support by @ivanfioravanti in #213
- Add voxtral by @Blaizzy in #214
New Contributors
- @zysam made their first contribution in #189
- @adrgrondin made their first contribution in #185
- @kinkadius made their first contribution in #210
Full Changelog: v0.2.3...v0.2.4