Releases · Blaizzy/mlx-audio

29 Jan 22:39

Blaizzy

v0.3.1

f7328a4

v0.3.1 Latest

Latest

What's Changed

Update uv.lock to reflect dependency version changes by @Blaizzy in #432
v0.3.1: Update STT API docs and fix default output path by @Blaizzy in #433
Qwen3-TTS: Add streaming and optimise peak usage by @Blaizzy in #435
Fix: Use single quotes in README examples to avoid Bash history expansion. by @reinexworldc in #440
Fix: improve import error hadling by @reinexworldc in #443
[Qwen3-TTS] Fix some Custom Voices producing silence with 0.6B by @Blaizzy in #444
Refactor audio load by @Blaizzy in #445
Update pyproject.toml for poetry support by @lucasnewman in #446
Add Qwen3-ASR by @Blaizzy in #454
Fix chatterbox load by @Blaizzy in #455
Update README to remove basic usage section by @Blaizzy in #456
Update README with output path for ASR commands by @rahimnathwani in #458
Update package dependencies in uv.lock to include new extras by @Blaizzy in #457
Fix server (STT, TTS) by @Blaizzy in #460

New Contributors

@reinexworldc made their first contribution in #440

Full Changelog: v0.3.0...v0.3.1

Contributors

rahimnathwani, Blaizzy, and 2 other contributors

Assets 2

25 Jan 21:43

Blaizzy

v0.3.0

02ada37

v0.3.0

What's Changed

Fix speaker embedding extraction in Qwen3-TTS model by @Blaizzy in #390
Fix Qwen3-TTS tail artifacts by @Blaizzy in #391
Fix Qwen3-TTS Base Voice Cloning by @Blaizzy in #394
Add Vibevoice ASR by @Blaizzy in #389
Qwen3 speaker embedding tests by @Blaizzy in #396
Update TTS commands in README to include language code option by @rudolfolah in #401
Unify Mimi implementation for Pocket TTS by @lucasnewman in #403
Fix issue of ref_audio not loading prior to inference with server. by @BuffMcBigHuge in #406
Enhance README with installation and usage examples by @rahimnathwani in #404
Upgrade GitHub Actions for Node 24 compatibility by @salmanmkc in #418
Upgrade GitHub Actions to latest versions by @salmanmkc in #419
[VibeVoice-ASR] Fix Metal kernel crash and optimize memory for long audio by @Blaizzy in #417
fix: Allowing quantization of Qwen3-TTS! Adding model_quant_predicate to Qwen3-TTS to exclude embedding layers by @kyr0 in #398
Fix qwen3 tts quants (silence in VC and word precision) by @Blaizzy in #407
Fix stt array io by @Blaizzy in #426
Update MANIFEST.in to remove leading dot from requirements.txt path by @Blaizzy in #428
Move audio path/format prints under verbose flag by @wladpaiva in #429
Update pyproject.toml and GitHub Actions workflow for package publishing by @Blaizzy in #431

New Contributors

@rudolfolah made their first contribution in #401
@BuffMcBigHuge made their first contribution in #406
@rahimnathwani made their first contribution in #404
@salmanmkc made their first contribution in #418
@kyr0 made their first contribution in #398
@wladpaiva made their first contribution in #429

Full Changelog: v0.2.10...v0.3.0

Contributors

kyr0, rahimnathwani, and 6 other contributors

Assets 2

22 Jan 21:39

Blaizzy

v0.3.0rc1

185a0d7

v0.3.0rc1 Pre-release

Pre-release

What's Changed

Remove extra deps by @Blaizzy in #373
Refactor load by @Blaizzy in #374
Add lfm2 audio by @Blaizzy in #370
update lfm readme by @Blaizzy in #377
Fix lang codes kokoro by @Blaizzy in #380
Replace soundfile with miniaudio + ffmpeg by @Blaizzy in #379
Add Pocket TTS model by @lucasnewman in #381
Fix STT stream by @Blaizzy in #382
Migrate swift to https://github.com/Blaizzy/mlx-audio-swift by @Blaizzy in #363
Refactor model path retrieval in get_model_path function by @Blaizzy in #383
Add streaming decoding to snac and orpheus by @Blaizzy in #384
Update generate output path by @Blaizzy in #385
Add Qwen3-TTS by @Blaizzy in #388

Full Changelog: v0.2.10...v0.3.0rc1

Contributors

Blaizzy and lucasnewman

Assets 2

06 Jan 20:44

Blaizzy

v0.2.10

67326f3

v0.2.10

What's Changed

Refactor GLMASR and improve LM style ASR logging by @Blaizzy in #332
Remove actual issue ID reference from PR template by @mootari in #334
Add maya1 fixes to Llama by @Blaizzy in #340
Fix marvis, chatterbox and args by @Blaizzy in #342
Add Sam Audio by @Blaizzy in #338
fix: Add missing mlx-lm dependency by @joshwhiton in #344
feat(swift): add Kokoro-82M-v1.1-zh MLX Support by @Alex-Wengg in #341
Remove loguru reconfiguration on Kokoro import by @joshwhiton in #348
feat(stt): add AlignAtt streaming transcription for Whisper by @beshkenadze in #321
Fix stft args by @Blaizzy in #354
Allow using DACVAE as a codec independent of SAM Audio model by @lucasnewman in #357
chore: update Python version requirement and dependencies by @Blaizzy in #355
Add MossFormer2 SE (Speech Enhancement) by @starkdmi in #351
use chatterbox MTLTokenizer for multilingual. by @litmudoc in #362
Add streaming and refactor Sam Audio API by @Blaizzy in #360
Add Soprano by @Blaizzy in #359
Fix model type, refactor orpheus style models by @Blaizzy in #358
revert default response format to mp3 by @Blaizzy in #356
Refactor voice loading in KokoroPipeline to support .safetensors files by @Blaizzy in #364
Add uv.lock and pin all deps as core by @Blaizzy in #366

New Contributors

@mootari made their first contribution in #334
@Alex-Wengg made their first contribution in #341
@starkdmi made their first contribution in #351
@litmudoc made their first contribution in #362

Full Changelog: v0.2.9...v0.2.10

Contributors

beshkenadze, mootari, and 6 other contributors

Assets 2

20 Dec 20:16

Blaizzy

v0.2.9

4bc1d0c

v0.2.9

What's Changed

Add GLM ASR by @Blaizzy in #320
Simplify convert API for TTS and STT by @Blaizzy in #324
[Chatterbox-turbo] Add speaker embedding by @Blaizzy in #322
[Chatterbox-turbo] Add in-place cache by @Blaizzy in #322
[Chatterbox-turbo] Add audio streaming by @Blaizzy in #322
[Chatterbox-turbo] Add audio chunking by @Blaizzy in #322

Full Changelog: v0.2.6...v0.2.9

Contributors

Blaizzy

Assets 2

17 Dec 19:30

Blaizzy

v0.2.8

22a2bb9

v0.2.8

What's Changed

fix(server): use lowercase for default response_format by @beshkenadze in #301
Add Chatterbox and Chatterbox Turbo by @Blaizzy in #302
Add Chatterbox [VC only] by @DePasqualeOrg in #282
feat: add lazy imports for TTS/STT modules by @beshkenadze in #290
Pin tfms dep <5.0.0 by @Blaizzy in #303
feat: migrate from setup.py to pyproject.toml with optional deps by @beshkenadze in #291
fix(test): use case-insensitive content-type comparison by @beshkenadze in #300
ci: add modular installation tests for pyproject.toml extras by @beshkenadze in #298
Fix build by @Blaizzy in #304

Full Changelog: v0.2.7...v0.2.8

Contributors

beshkenadze, Blaizzy, and DePasqualeOrg

Assets 2

16 Dec 20:35

Blaizzy

v0.2.7

f3cd320

v0.2.7

What's Changed

Refactor Marvis TTS API: Make public methods accessible by @rudrankriyam in #259
Add Marvis model selection to TTS UI by @rudrankriyam in #261
Add Marvis quant selection to TTS Web UI by @adrgrondin in #264
Fix security vulnerabilities in Next.js and brace-expansion dependencies by @Copilot in #265
make the methods more useful by @pritamsoni-hsr in #246
Update to mlx-swift-lm and remove redundant mlx-swift dependency by @rudrankriyam in #269
Fix voxtral segments by @Blaizzy in #273
Add UI startup option to Server by @Blaizzy in #274
Add preemphasis preprocessing support for Parakeet models to match NeMo training config by @joshwhiton in #286
feat: Add support for VoxCPM (w/ voice cloning) by @voxmenthe in #293
Fix spark decoding by @Blaizzy in #296
feat: extract DSP utilities to dedicated module by @beshkenadze in #289
Feat: add response format option to SpeechRequest by @Blaizzy in #297
Add VibeVoice by @Blaizzy in #295

New Contributors

@pritamsoni-hsr made their first contribution in #246
@joshwhiton made their first contribution in #286
@voxmenthe made their first contribution in #293
@beshkenadze made their first contribution in #289

Full Changelog: v0.2.6...v0.2.7

Contributors

beshkenadze, voxmenthe, and 5 other contributors

Assets 2

07 Nov 17:08

Blaizzy

v0.2.6

bcd5ccf

v0.2.6

What's Changed

fix wav2vec by @josharian in #222
Fix RTF calculation in kokoro model by @davidxifeng in #227
Fix Unnecessary Audio Transcription for the IndexTTS Model by @bytefer in #231
Add Sesame TTS Integration for Swift Audio Package by @rudrankriyam in #223
Use batched vocoding to reduce peak memory usage with Sesame arch models by @lucasnewman in #236
Cache RoPE by dtype for Sesame arch models for improved generation performance by @lucasnewman in #232
Install Metal toolchain for Swift tests by @lucasnewman in #233
Adopt changes interface changes from mlx-lm to fix Sesame-arch models by @lucasnewman in #242
Update swift-transformers dependency to 1.1.0 by @Liam1506 in #247
Improve Swift TTS app UX by @rudrankriyam in #248
Add quality selection and streaming controls to Marvis with UI support for macOS & iOS by @rudrankriyam in #249
Fix Swift compiler warnings by @rudrankriyam in #250
Refactor MarvisModel to handle optional backbone and decoder flavors by @rudrankriyam in #251
Fix iOS 16 compatibility and ESpeakNG framework linking for iOS app by @rudrankriyam in #252
Add memory increase limit for iOS by @rudrankriyam in #253
Update audio playback management in Marvis TTS by @rudrankriyam in #254
Bump version and add new copy files by @Blaizzy in #255
Add UI v2 by @Blaizzy in #154

New Contributors

@josharian made their first contribution in #222
@davidxifeng made their first contribution in #227
@bytefer made their first contribution in #231
@Liam1506 made their first contribution in #247

Full Changelog: v0.2.5...v0.2.6

Contributors

josharian, davidxifeng, and 5 other contributors

Assets 2

26 Aug 18:06

Blaizzy

v0.2.5

cc6bdb4

v0.2.5

What's Changed

Use indeterminate progress for CSM models by @lucasnewman in #216
Bump version to 0.2.5 by @Blaizzy in #219

Full Changelog: v0.2.4...v0.2.5

Contributors

Blaizzy and lucasnewman

Assets 2

18 Aug 13:29

Blaizzy

v0.2.4

d494987

v0.2.4

What's Changed

move sentence splitting into a separate utility class and add unit tests by @smdesai in #183
Add BigVGAN neural audio codec by @senstella in #186
fix: (outetts)loading model: Speaker file not found. by @zysam in #189
Fix deprecated save in MLX-LM by @Blaizzy in #194
Implementation of Misaki G2P tokenizer by @smdesai in #193
Add IndexTTS by @senstella in #187
Load both lexicon files us_gold and us_silver with words in us_gold taking precedence by @smdesai in #195
Add S3 semantic tokenizer / neural audio codec by @lucasnewman in #204
add lexicon files for British sounds, gb_gold and gb_silver by @smdesai in #197
Fix Mimi codec by @lucasnewman in #209
Add ability to use a custom URL to load Kokoro safetensors by @adrgrondin in #185
Handle transformers-style config for Sesame CSM models by @lucasnewman in #211
Add Xcode build troubleshooting documentation by @kinkadius in #210
Multi model support by @ivanfioravanti in #213
Add voxtral by @Blaizzy in #214

New Contributors

@zysam made their first contribution in #189
@adrgrondin made their first contribution in #185
@kinkadius made their first contribution in #210

Full Changelog: v0.2.3...v0.2.4

Contributors

ivanfioravanti, smdesai, and 6 other contributors

Assets 2

Uh oh!

Releases: Blaizzy/mlx-audio

v0.3.1

What's Changed

New Contributors

Contributors

Uh oh!

v0.3.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.3.0rc1

What's Changed

Contributors

Uh oh!

v0.2.10

What's Changed

New Contributors

Contributors

Uh oh!

v0.2.9

What's Changed

Contributors

Uh oh!

v0.2.8

What's Changed

Contributors

Uh oh!

v0.2.7

What's Changed

New Contributors

Contributors

Uh oh!

v0.2.6

What's Changed

New Contributors

Contributors

Uh oh!

v0.2.5

What's Changed

Contributors

Uh oh!

v0.2.4

What's Changed

New Contributors

Contributors

Uh oh!