AI Voice Agents

AI Voice Agents - Exploring the Next Generation of Human-Machine Interaction! 🎙️🤖🎧

Project List

Full Stack

Source	Description	Model
Bland AI	Bland AI - Automate Phone Calls with Conversational AI. Transform your enterprise communication with Bland AI. Automate inbound and outbound phone calls using AI that sounds human. Bland is a platform for AI phone calling. Using our API, you can easily send or receive phone calls with a programmable voice agent.	API
GPT-4o	GPT-4o (“o” for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs.	API
Retell AI	Retell AI -Build Advanced Voice AI, Powered by LLM.	API

^ Back to Contents ^

Text To Speech

Source	Description	Code	Paper	Model
ChatTTS	ChatTTS is a text-to-speech model designed specifically for dialogue scenario such as LLM assistant.	GitHub		Hugging Face
CosyVoice	Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.	GitHub
ElevenLabs	ElevenLabs: Text to Speech & AI Voice Generator.			API
Matcha-TTS	Matcha-TTS: A fast TTS architecture with conditional flow matching.	GitHub	arXiv
StyleTTS 2	Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models.	GitHub	arXiv
XTTS	🐸TTS is a library for advanced Text-to-Speech generation.	GitHub

^ Back to Contents ^

Automatic Speech Recognition

Source	Description	Code	Paper	Model
SenseVoice	SenseVoice is a speech foundation model with multiple speech understanding capabilities, including automatic speech recognition (ASR), spoken language identification (LID), speech emotion recognition (SER), and audio event detection (AED).	GitHub		Hugging Face
TeleSpeech-ASR	Large speech model-super multi-dialect ASR.	GitHub		Hugging Face
Whisper	Whisper is a general-purpose speech recognition model.	GitHub	arXiv	Hugging Face

^ Back to Contents ^

Audio Generation

Source	Description	Code	Paper	Model
Make-An-Audio 3	Transforming Text into Audio via Flow-based Large Diffusion Transformers.	GitHub	arXiv	Hugging Face

^ Back to Contents ^

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
AI-Voice-Agents.png		AI-Voice-Agents.png
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Voice Agents

Table of Contents

Project List

Full Stack

Text To Speech

Automatic Speech Recognition

Audio Generation

About

Uh oh!

Releases

Packages

License

Yuan-ManX/ai-voice-agents

Folders and files

Latest commit

History

Repository files navigation

AI Voice Agents

Table of Contents

Project List

Full Stack

Text To Speech

Automatic Speech Recognition

Audio Generation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages