resemble-ai · IgorAherne · Jul 20, 2025 · Jul 20, 2025 · Jul 21, 2025 · Jul 21, 2025
diff --git a/.gitignore b/.gitignore
@@ -45,4 +45,4 @@ checkpoints/
 .gradio
 
 # Ignore generated sample .wav files
-**/*.wav
+**/*.wav
diff --git a/Chatterbox-Multilingual.png b/Chatterbox-Multilingual.png
diff --git a/README.md b/README.md
@@ -1,5 +1,5 @@
 
-<img width="1200" alt="cb-big2" src="https://github.com/user-attachments/assets/bd8c5f03-e91d-4ee5-b680-57355da204d1" />
+<img width="1200" height="600" alt="Chatterbox-Multilingual" src="https://www.resemble.ai/wp-content/uploads/2025/09/Chatterbox-Multilingual-1.png" />
 
 # Chatterbox TTS
 
@@ -10,14 +10,15 @@
 
 _Made with ♥️ by <a href="https://resemble.ai" target="_blank"><img width="100" alt="resemble-logo-horizontal" src="https://github.com/user-attachments/assets/35cf756b-3506-4943-9c72-c05ddfa4e525" /></a>
 
-We're excited to introduce Chatterbox, [Resemble AI's](https://resemble.ai) first production-grade open source TTS model. Licensed under MIT, Chatterbox has been benchmarked against leading closed-source systems like ElevenLabs, and is consistently preferred in side-by-side evaluations.
+We're excited to introduce **Chatterbox Multilingual**, [Resemble AI's](https://resemble.ai) first production-grade open source TTS model supporting **23 languages** out of the box. Licensed under MIT, Chatterbox has been benchmarked against leading closed-source systems like ElevenLabs, and is consistently preferred in side-by-side evaluations.
 
-Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. It's also the first open source TTS model to support **emotion exaggeration control**, a powerful feature that makes your voices stand out. Try it now on our [Hugging Face Gradio app.](https://huggingface.co/spaces/ResembleAI/Chatterbox)
+Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life across languages. It's also the first open source TTS model to support **emotion exaggeration control** with robust **multilingual zero-shot voice cloning**. Try the english only version now on our [English Hugging Face Gradio app.](https://huggingface.co/spaces/ResembleAI/Chatterbox). Or try the multilingual version on our [Multilingual Hugging Face Gradio app.](https://huggingface.co/spaces/ResembleAI/Chatterbox-Multilingual-TTS).
 
 If you like the model but need to scale or tune it for higher accuracy, check out our competitively priced TTS service (<a href="https://resemble.ai">link</a>). It delivers reliable performance with ultra-low latency of sub 200ms—ideal for production use in agents, applications, or interactive media.
 
 # Key Details
-- SoTA zeroshot TTS
+- Multilingual, zero-shot TTS supporting 23 languages
+- SoTA zeroshot English TTS
 - 0.5B Llama backbone
 - Unique exaggeration/intensity control
 - Ultra-stable with alignment-informed inference
@@ -26,9 +27,12 @@ If you like the model but need to scale or tune it for higher accuracy, check ou
 - Easy voice conversion script
 - [Outperforms ElevenLabs](https://podonos.com/resembleai/chatterbox)
 
+# Supported Languages 
+Arabic (ar) • Danish (da) • German (de) • Greek (el) • English (en) • Spanish (es) • Finnish (fi) • French (fr) • Hebrew (he) • Hindi (hi) • Italian (it) • Japanese (ja) • Korean (ko) • Malay (ms) • Dutch (nl) • Norwegian (no) • Polish (pl) • Portuguese (pt) • Russian (ru) • Swedish (sv) • Swahili (sw) • Turkish (tr) • Chinese (zh)
 # Tips
 - **General Use (TTS and Voice Agents):**
-  - The default settings (`exaggeration=0.5`, `cfg_weight=0.5`) work well for most prompts.
+  - Ensure that the reference clip matches the specified language tag. Otherwise, language transfer outputs may inherit the accent of the reference clip’s language. To mitigate this, set `cfg_weight` to `0`.
+  - The default settings (`exaggeration=0.5`, `cfg_weight=0.5`) work well for most prompts across all languages.
   - If the reference speaker has a fast speaking style, lowering `cfg_weight` to around `0.3` can improve pacing.
 
 - **Expressive or Dramatic Speech:**
@@ -50,19 +54,31 @@ git clone https://github.com/resemble-ai/chatterbox.git
 cd chatterbox
 pip install -e .
 ```
-We developed and tested Chatterbox on Python 3.11 on Debain 11 OS; the versions of the dependencies are pinned in `pyproject.toml` to ensure consistency. You can modify the code or dependencies in this installation mode.
-
+We developed and tested Chatterbox on Python 3.11 on Debian 11 OS; the versions of the dependencies are pinned in `pyproject.toml` to ensure consistency. You can modify the code or dependencies in this installation mode.
 
 # Usage
 ```python
 import torchaudio as ta
 from chatterbox.tts import ChatterboxTTS
+from chatterbox.mtl_tts import ChatterboxMultilingualTTS
 
+# English example
 model = ChatterboxTTS.from_pretrained(device="cuda")
 
 text = "Ezreal and Jinx teamed up with Ahri, Yasuo, and Teemo to take down the enemy's Nexus in an epic late-game pentakill."
 wav = model.generate(text)
-ta.save("test-1.wav", wav, model.sr)
+ta.save("test-english.wav", wav, model.sr)
+
+# Multilingual examples
+multilingual_model = ChatterboxMultilingualTTS.from_pretrained(device=device)
+
+french_text = "Bonjour, comment ça va? Ceci est le modèle de synthèse vocale multilingue Chatterbox, il prend en charge 23 langues."
+wav_french = multilingual_model.generate(spanish_text, language_id="fr")
+ta.save("test-french.wav", wav_french, model.sr)
+
+chinese_text = "你好，今天天气真不错，希望你有一个愉快的周末。"
+wav_chinese = multilingual_model.generate(chinese_text, language_id="zh")
+ta.save("test-chinese.wav", wav_chinese, model.sr)
 
 # If you want to synthesize with a different voice, specify the audio prompt
 AUDIO_PROMPT_PATH = "YOUR_FILE.wav"
@@ -71,9 +87,6 @@ ta.save("test-2.wav", wav, model.sr)
 ```
 See `example_tts.py` and `example_vc.py` for more examples.
 
-# Supported Lanugage
-Currenlty only English.
-
 # Acknowledgements
 - [Cosyvoice](https://github.com/FunAudioLLM/CosyVoice)
 - [Real-Time-Voice-Cloning](https://github.com/CorentinJ/Real-Time-Voice-Cloning)
@@ -113,5 +126,16 @@ print(f"Extracted watermark: {watermark}")
 
 👋 Join us on [Discord](https://discord.gg/rJq9cRJBJ6) and let's build something awesome together!
 
+# Citation
+If you find this model useful, please consider citing.
+```
+@misc{chatterboxtts2025,
+  author       = {{Resemble AI}},
+  title        = {{Chatterbox-TTS}},
+  year         = {2025},
+  howpublished = {\url{https://github.com/resemble-ai/chatterbox}},
+  note         = {GitHub repository}
+}
+```
 # Disclaimer
 Don't use this model to do bad things. Prompts are sourced from freely available data on the internet.
diff --git a/chatterbox.pyproj b/chatterbox.pyproj
@@ -0,0 +1,180 @@
+<Project DefaultTargets="Build" xmlns="http://schemas.microsoft.com/developer/msbuild/2003" ToolsVersion="4.0">
+  <PropertyGroup>
+    <Configuration Condition=" '$(Configuration)' == '' ">Debug</Configuration>
+    <SchemaVersion>2.0</SchemaVersion>
+    <ProjectGuid>958ac2fa-ebae-40f7-bf36-e239ba9b0efd</ProjectGuid>
+    <ProjectHome>.</ProjectHome>
+    <StartupFile>run_tts_test.py</StartupFile>
+    <SearchPath>
+    </SearchPath>
+    <WorkingDirectory>.</WorkingDirectory>
+    <OutputPath>.</OutputPath>
+    <Name>chatterbox</Name>
+    <RootNamespace>chatterbox</RootNamespace>
+    <SuppressConfigureTestFrameworkPrompt>true</SuppressConfigureTestFrameworkPrompt>
+    <InterpreterId>MSBuild|venv|C:\_myDrive\repos\auto-vlog\AutoVlogProj\AutoVlogProj.pyproj</InterpreterId>
+  </PropertyGroup>
+  <PropertyGroup Condition=" '$(Configuration)' == 'Debug' ">
+    <DebugSymbols>true</DebugSymbols>
+    <EnableUnmanagedDebugging>false</EnableUnmanagedDebugging>
+  </PropertyGroup>
+  <PropertyGroup Condition=" '$(Configuration)' == 'Release' ">
+    <DebugSymbols>true</DebugSymbols>
+    <EnableUnmanagedDebugging>false</EnableUnmanagedDebugging>
+  </PropertyGroup>
+  <ItemGroup>
+    <Compile Include="example_for_mac.py" />
+    <Compile Include="example_tts.py" />
+    <Compile Include="example_vc.py" />
+    <Compile Include="example_vc_batching.py" />
+    <Compile Include="gradio_tts_app.py" />
+    <Compile Include="gradio_vc_app.py" />
+    <Compile Include="run_tts_test.py" />
+    <Compile Include="src\chatterbox\models\s3gen\configs.py" />
+    <Compile Include="src\chatterbox\models\s3gen\const.py" />
+    <Compile Include="src\chatterbox\models\s3gen\decoder.py" />
+    <Compile Include="src\chatterbox\models\s3gen\f0_predictor.py" />
+    <Compile Include="src\chatterbox\models\s3gen\flow.py" />
+    <Compile Include="src\chatterbox\models\s3gen\flow_matching.py" />
+    <Compile Include="src\chatterbox\models\s3gen\hifigan.py" />
+    <Compile Include="src\chatterbox\models\s3gen\matcha\decoder.py" />
+    <Compile Include="src\chatterbox\models\s3gen\matcha\flow_matching.py" />
+    <Compile Include="src\chatterbox\models\s3gen\matcha\text_encoder.py" />
+    <Compile Include="src\chatterbox\models\s3gen\matcha\transformer.py" />
+    <Compile Include="src\chatterbox\models\s3gen\s3gen.py" />
+    <Compile Include="src\chatterbox\models\s3gen\transformer\activation.py" />
+    <Compile Include="src\chatterbox\models\s3gen\transformer\attention.py" />
+    <Compile Include="src\chatterbox\models\s3gen\transformer\convolution.py" />
+    <Compile Include="src\chatterbox\models\s3gen\transformer\embedding.py" />
+    <Compile Include="src\chatterbox\models\s3gen\transformer\encoder_layer.py" />
+    <Compile Include="src\chatterbox\models\s3gen\transformer\positionwise_feed_forward.py" />
+    <Compile Include="src\chatterbox\models\s3gen\transformer\subsampling.py" />
+    <Compile Include="src\chatterbox\models\s3gen\transformer\upsample_encoder.py" />
+    <Compile Include="src\chatterbox\models\s3gen\transformer\__init__.py" />
+    <Compile Include="src\chatterbox\models\s3gen\utils\class_utils.py" />
+    <Compile Include="src\chatterbox\models\s3gen\utils\mask.py" />
+    <Compile Include="src\chatterbox\models\s3gen\utils\mel.py" />
+    <Compile Include="src\chatterbox\models\s3gen\xvector.py" />
+    <Compile Include="src\chatterbox\models\s3gen\__init__.py" />
+    <Compile Include="src\chatterbox\models\s3tokenizer\s3tokenizer.py" />
+    <Compile Include="src\chatterbox\models\s3tokenizer\__init__.py" />
+    <Compile Include="src\chatterbox\models\t3\inference\alignment_stream_analyzer.py" />
+    <Compile Include="src\chatterbox\models\t3\inference\t3_hf_backend.py" />
+    <Compile Include="src\chatterbox\models\t3\llama_configs.py" />
+    <Compile Include="src\chatterbox\models\t3\modules\cond_enc.py" />
+    <Compile Include="src\chatterbox\models\t3\modules\learned_pos_emb.py" />
+    <Compile Include="src\chatterbox\models\t3\modules\perceiver.py" />
+    <Compile Include="src\chatterbox\models\t3\modules\t3_config.py" />
+    <Compile Include="src\chatterbox\models\t3\t3.py" />
+    <Compile Include="src\chatterbox\models\t3\__init__.py" />
+    <Compile Include="src\chatterbox\models\tokenizers\tokenizer.py" />
+    <Compile Include="src\chatterbox\models\tokenizers\__init__.py" />
+    <Compile Include="src\chatterbox\models\utils.py" />
+    <Compile Include="src\chatterbox\models\voice_encoder\config.py" />
+    <Compile Include="src\chatterbox\models\voice_encoder\melspec.py" />
+    <Compile Include="src\chatterbox\models\voice_encoder\voice_encoder.py" />
+    <Compile Include="src\chatterbox\models\voice_encoder\__init__.py" />
+    <Compile Include="src\chatterbox\models\__init__.py" />
+    <Compile Include="src\chatterbox\tts.py" />
+    <Compile Include="src\chatterbox\vc.py" />
+    <Compile Include="src\chatterbox\__init__.py" />
+  </ItemGroup>
+  <ItemGroup>
+    <Folder Include="src\" />
+    <Folder Include="src\chatterbox\" />
+    <Folder Include="src\chatterbox\models\" />
+    <Folder Include="src\chatterbox\models\s3gen\" />
+    <Folder Include="src\chatterbox\models\s3gen\matcha\" />
+    <Folder Include="src\chatterbox\models\s3gen\matcha\__pycache__\" />
+    <Folder Include="src\chatterbox\models\s3gen\transformer\" />
+    <Folder Include="src\chatterbox\models\s3gen\transformer\__pycache__\" />
+    <Folder Include="src\chatterbox\models\s3gen\utils\" />
+    <Folder Include="src\chatterbox\models\s3gen\utils\__pycache__\" />
+    <Folder Include="src\chatterbox\models\s3gen\__pycache__\" />
+    <Folder Include="src\chatterbox\models\s3tokenizer\" />
+    <Folder Include="src\chatterbox\models\s3tokenizer\__pycache__\" />
+    <Folder Include="src\chatterbox\models\t3\" />
+    <Folder Include="src\chatterbox\models\t3\inference\" />
+    <Folder Include="src\chatterbox\models\t3\inference\__pycache__\" />
+    <Folder Include="src\chatterbox\models\t3\modules\" />
+    <Folder Include="src\chatterbox\models\t3\modules\__pycache__\" />
+    <Folder Include="src\chatterbox\models\t3\__pycache__\" />
+    <Folder Include="src\chatterbox\models\tokenizers\" />
+    <Folder Include="src\chatterbox\models\tokenizers\__pycache__\" />
+    <Folder Include="src\chatterbox\models\voice_encoder\" />
+    <Folder Include="src\chatterbox\models\voice_encoder\__pycache__\" />
+    <Folder Include="src\chatterbox\models\__pycache__\" />
+    <Folder Include="src\chatterbox\__pycache__\" />
+    <Folder Include="src\chatterbox_tts.egg-info\" />
+    <Folder Include="tts_test_outputs\" />
+  </ItemGroup>
+  <ItemGroup>
+    <Content Include="src\chatterbox\models\s3gen\matcha\__pycache__\decoder.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\s3gen\matcha\__pycache__\flow_matching.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\s3gen\matcha\__pycache__\transformer.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\s3gen\transformer\__pycache__\activation.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\s3gen\transformer\__pycache__\attention.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\s3gen\transformer\__pycache__\convolution.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\s3gen\transformer\__pycache__\embedding.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\s3gen\transformer\__pycache__\encoder_layer.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\s3gen\transformer\__pycache__\positionwise_feed_forward.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\s3gen\transformer\__pycache__\subsampling.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\s3gen\transformer\__pycache__\upsample_encoder.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\s3gen\transformer\__pycache__\__init__.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\s3gen\utils\__pycache__\class_utils.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\s3gen\utils\__pycache__\mask.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\s3gen\utils\__pycache__\mel.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\s3gen\__pycache__\configs.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\s3gen\__pycache__\const.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\s3gen\__pycache__\decoder.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\s3gen\__pycache__\f0_predictor.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\s3gen\__pycache__\flow.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\s3gen\__pycache__\flow_matching.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\s3gen\__pycache__\hifigan.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\s3gen\__pycache__\s3gen.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\s3gen\__pycache__\xvector.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\s3gen\__pycache__\__init__.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\s3tokenizer\__pycache__\s3tokenizer.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\s3tokenizer\__pycache__\__init__.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\t3\inference\__pycache__\t3_hf_backend.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\t3\modules\__pycache__\cond_enc.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\t3\modules\__pycache__\learned_pos_emb.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\t3\modules\__pycache__\perceiver.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\t3\modules\__pycache__\t3_config.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\t3\__pycache__\llama_configs.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\t3\__pycache__\t3.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\t3\__pycache__\__init__.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\tokenizers\__pycache__\tokenizer.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\tokenizers\__pycache__\__init__.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\voice_encoder\__pycache__\config.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\voice_encoder\__pycache__\melspec.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\voice_encoder\__pycache__\voice_encoder.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\voice_encoder\__pycache__\__init__.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\__pycache__\utils.cpython-311.pyc" />
+    <Content Include="src\chatterbox\models\__pycache__\__init__.cpython-311.pyc" />
+    <Content Include="src\chatterbox\__pycache__\tts.cpython-311.pyc" />
+    <Content Include="src\chatterbox\__pycache__\vc.cpython-311.pyc" />
+    <Content Include="src\chatterbox\__pycache__\__init__.cpython-311.pyc" />
+    <Content Include="src\chatterbox_tts.egg-info\dependency_links.txt" />
+    <Content Include="src\chatterbox_tts.egg-info\PKG-INFO" />
+    <Content Include="src\chatterbox_tts.egg-info\requires.txt" />
+    <Content Include="src\chatterbox_tts.egg-info\SOURCES.txt" />
+    <Content Include="src\chatterbox_tts.egg-info\top_level.txt" />
+    <Content Include="tts_test_outputs\output_batch_1.wav" />
+    <Content Include="tts_test_outputs\output_batch_2.wav" />
+    <Content Include="tts_test_outputs\output_batch_3.wav" />
+    <Content Include="tts_test_outputs\output_batch_4.wav" />
+  </ItemGroup>
+  <ItemGroup>
+    <InterpreterReference Include="MSBuild|venv|C:\_myDrive\repos\auto-vlog\AutoVlogProj\AutoVlogProj.pyproj" />
+  </ItemGroup>
+  <Import Project="$(MSBuildExtensionsPath32)\Microsoft\VisualStudio\v$(VisualStudioVersion)\Python Tools\Microsoft.PythonTools.targets" />
+  <!-- Uncomment the CoreCompile target to enable the Build command in
+       Visual Studio and specify your pre- and post-build commands in
+       the BeforeBuild and AfterBuild targets below. -->
+  <!--<Target Name="CoreCompile" />-->
+  <Target Name="BeforeBuild">
+  </Target>
+  <Target Name="AfterBuild">
+  </Target>
+</Project>
diff --git a/example_tts.py b/example_tts.py
@@ -1,6 +1,9 @@
+# example_tts.py
+
 import torchaudio as ta
 import torch
 from chatterbox.tts import ChatterboxTTS
+from chatterbox.mtl_tts import ChatterboxMultilingualTTS
 
 # Automatically detect the best available device
 if torch.cuda.is_available():
@@ -14,11 +17,17 @@
 
 model = ChatterboxTTS.from_pretrained(device=device)
 
-text = "Ezreal and Jinx teamed up with Ahri, Yasuo, and Teemo to take down the enemy's Nexus in an epic late-game pentakill."
+text = "Ezreal and Jinx teamed up with Ahri, Yasuo, and Teemo to take down the enemy's Nexus."
 wav = model.generate(text)
-ta.save("test-1.wav", wav, model.sr)
+ta.save("test-default-voice.wav", wav, model.sr)
+
+multilingual_model = ChatterboxMultilingualTTS.from_pretrained(device=device)
+text = "Bonjour, comment ça va? Ceci est le modèle de synthèse vocale multilingue Chatterbox, il prend en charge 23 langues."
+wav = multilingual_model.generate(text, language_id="fr")
+ta.save("test-2.wav", wav, multilingual_model.sr)
+
 
 # If you want to synthesize with a different voice, specify the audio prompt
 AUDIO_PROMPT_PATH = "YOUR_FILE.wav"
 wav = model.generate(text, audio_prompt_path=AUDIO_PROMPT_PATH)
-ta.save("test-2.wav", wav, model.sr)
+ta.save("test-3.wav", wav, model.sr)
diff --git a/example_vc.py b/example_vc.py
@@ -1,3 +1,5 @@
+# example_vc.py
+
 import torch
 import torchaudio as ta