A demonstration of the Maya1 voice AI model, which generates realistic voice audio from text input with emotional and descriptive control.
This project showcases the Maya1 model, an open-source voice AI that can synthesize speech with specified voice characteristics and emotions. The demo script generates audio from a text prompt using a voice description, producing high-quality voice output.
- Text-to-speech synthesis with voice descriptions
- Emotional and stylistic voice control
- High-quality audio output at 24kHz
- Uses advanced language models and audio codecs
-
Ensure you have Python 3.8 or higher installed.
-
Install the required dependencies:
pip install torch transformers snac soundfile
-
The models will be automatically downloaded when you run the script:
- Maya1 model:
maya-research/maya1 - SNAC codec:
hubertsiuzdak/snac_24khz
- Maya1 model:
Run the demo script to generate voice audio:
python maya1_demo.pyThe script will:
- Load the Maya1 model and SNAC codec
- Generate voice based on the predefined description and text
- Save the output as
output.wav
You can modify the description and text variables in maya1_demo.py to customize the voice generation.
- Python 3.8+
- PyTorch
- Transformers library
- SNAC audio codec
- Soundfile for audio I/O
The generated audio is saved as output.wav in the project directory. The audio is encoded at 24kHz sample rate.
This project uses open-source models and libraries. Please refer to the individual model licenses for usage terms.