GitHub - ALLBOTSIO/awesome-voice-ai: A curated list of 150+ voice AI tools — agents, TTS, STT, cloning, telephony, frameworks. The definitive voice AI directory.

A curated list of voice AI tools, frameworks, platforms, and models.
Voice agents, text-to-speech, speech-to-text, voice cloning, telephony, and more.

Voice Agents · TTS · STT · Cloning · Frameworks · Telephony · Contribute

Why This List Exists

Voice AI is the fastest-growing segment of the AI stack in 2026. The global Voice Chat API market is projected to reach $3.5B by 2033. Platforms like Retell AI, Vapi, ElevenLabs, and LiveKit are processing billions of minutes monthly. Gartner predicts 40% of enterprise apps will feature AI agents by end of 2026.

But there's no single place to discover all the tools. This list fixes that.

Voice Agent Platforms — Build and deploy voice AI agents
Text-to-Speech — AI speech synthesis and voice generation
Speech-to-Text — Transcription, recognition, and understanding
Voice Cloning — Clone and generate custom voices
Frameworks & SDKs — Open-source voice AI frameworks
Telephony Infrastructure — SIP, phone numbers, call routing
Voice Analytics — Call analysis, sentiment, and QA
Voice Commerce — Voice-activated shopping and payments
Voice Platforms & Assistants — Alexa, Siri, Google ecosystem
MCP Servers for Voice — Model Context Protocol voice integrations

Voice Agent Platforms

Platforms for building, deploying, and managing AI voice agents.

Tool	Type	Description	Highlights
Retell AI	Commercial	Developer-friendly voice agent platform with drag-and-drop builder	MCP support, Twilio integration, multilingual, real-time workflows
Vapi	Commercial	Provider-agnostic voice AI orchestration layer	62M monthly calls, 14+ providers, $0.05/min, 99.99% SLA
ElevenLabs Conversational AI	Commercial	Voice agent platform with industry-leading voice quality	Sub-100ms latency, 11,000+ voices, 70+ languages
Bland AI	Commercial	High-volume outbound voice agent platform	Purpose-built for sales campaigns, enterprise telephony
LiveKit Agents	Open Source	Real-time voice agent framework with WebRTC	Fully open source, plugin architecture, MCP support
Synthflow	Commercial	No-code voice agent builder	White-label, 200+ integrations, appointment booking
Cognigy	Enterprise	Enterprise conversational AI platform	Omnichannel, contact center integration, 100+ languages
Lindy AI	Commercial	AI assistant with voice agent capabilities	Multi-step workflows, triggers, CRM integration
Air AI	Commercial	Autonomous voice agent for sales and customer service	40+ minute conversations, calendar booking
Voiceflow	Commercial	Collaborative voice & chat AI agent builder	Visual builder, team collaboration, multi-channel
Hamming AI	Commercial	Voice agent testing and evaluation platform	Automated QA, regression testing, performance scoring
Inworld AI	Commercial	Character-driven voice AI for games and enterprise	Real-time animation, emotional intelligence, gaming SDK
PlayAI	Commercial	Voice agent platform with ultra-realistic voices	Sub-200ms latency, voice cloning, custom models
Thoughtly	Commercial	Enterprise voice agent with human-like conversations	No-code builder, CRM sync, call analytics

Detailed comparison →

Text-to-Speech

AI-powered speech synthesis — from real-time voices to studio-quality narration.

Commercial

Tool	Latency	Languages	Highlights
ElevenLabs	<100ms	70+	Industry-leading quality, voice cloning, 11k+ voices
Play.ht	<200ms	60+	Ultra-realistic, voice cloning, API + widget
Deepgram Aura	<100ms	10+	Optimized for voice agents, streaming, low cost
Amazon Polly	<200ms	30+	AWS integration, SSML, neural voices
Google Cloud TTS	<200ms	40+	WaveNet and Neural2 voices, Studio voices
Azure Speech	<200ms	100+	Custom Neural Voice, SSML, avatar support
Resemble AI	<200ms	25+	Real-time cloning, emotion control, API
Cartesia	<80ms	10+	Sonic model, ultra-low latency streaming
Fish Audio	<150ms	10+	Open-weight models, voice cloning, multilingual

Open Source

Tool	Stars	License	Highlights
Coqui XTTS	36k+	MPL 2.0	Voice cloning with 6s sample, multilingual, most downloaded on HF
Bark	37k+	MIT	Non-verbal sounds, laughter, music, multi-speaker
Piper	7k+	MIT	Lightweight, runs on Raspberry Pi, 20+ languages
StyleTTS2	5k+	MIT	Studio-quality, style diffusion, human-level naturalness
GPT-SoVITS	40k+	MIT	1-min voice data training, few-shot cloning
OpenVoice	30k+	MIT	Instant voice cloning, emotion/accent control
MeloTTS	5k+	MIT	High-quality multi-language, real-time CPU inference
Parler-TTS	4k+	Apache 2.0	Text-described voice control, Hugging Face native
MetaVoice	3k+	Apache 2.0	Finetuning in 1 minute, emotional speech
OpenEdAI Speech	2k+	AGPL	OpenAI API-compatible TTS server, drop-in replacement
AllTalk TTS	2k+	AGPL	Multi-engine TTS server, text-generation-webui support

Full TTS comparison →

Speech-to-Text

Transcription, real-time recognition, and audio understanding.

Commercial APIs

Tool	WER	Languages	Highlights
Deepgram Nova-3	<8%	40+	Fastest streaming, $0.0043/min, voice agent optimized
AssemblyAI (Slam-1)	<10%	20+	Speaker diarization, sentiment, content moderation
Gladia	<10%	100+	Real-time code-switching, zero-shot language detection
Speechmatics	<10%	50+	On-prem option, real-time, batch, translation
Rev AI	<5%	30+	Human-verified option, 99% accuracy, legal/medical
Google Cloud STT	<12%	125+	Chirp model, medical dictation, streaming
Azure Speech	<10%	100+	Custom speech models, real-time, batch
Amazon Transcribe	<12%	100+	Medical, call analytics, volume discounts to 67%

Open Source

Tool	Stars	License	Highlights
Whisper	75k+	MIT	The standard, 99+ languages, robust in noise
Faster-Whisper	13k+	MIT	4x faster than Whisper, CTranslate2 backend
Whisper.cpp	37k+	MIT	C/C++ port, runs on CPU/phone/RPi, WASM support
Vosk	8k+	Apache 2.0	Offline, lightweight, mobile/embedded, 20+ languages
NeMo (NVIDIA)	12k+	Apache 2.0	Enterprise-grade, Conformer/Canary models, GPU optimized
Whisper-Streaming	2k+	MIT	Real-time streaming Whisper with local agreement
Insanely-Fast-Whisper	7k+	MIT	150x faster with speculative decoding + batching

Full STT comparison →

Voice Cloning

Clone voices from short audio samples for custom TTS.

Tool	Stars	Type	Highlights
GPT-SoVITS	40k+	Open Source	1-min data training, few-shot, singing + speech
OpenVoice	30k+	Open Source	Instant cloning, tone/emotion/accent control
RVC (Retrieval-based VC)	25k+	Open Source	Real-time conversion, music community favorite
Coqui XTTS	36k+	Open Source	6-second sample cloning, 17 languages
Bark Voice Clone	3k+	Open Source	Bark + voice cloning, easy speaker prompts
ElevenLabs	—	Commercial	Professional voice cloning, 29 languages
Resemble AI	—	Commercial	Real-time cloning, emotion control, watermarking
Play.ht	—	Commercial	Instant cloning, cross-lingual, API access

Full voice cloning comparison →

Frameworks & SDKs

Open-source frameworks for building voice AI applications.

Framework	Stars	Language	Description
Pipecat	8k+	Python	Voice + multimodal conversational AI by Daily. STT→LLM→TTS pipelines
LiveKit Agents	5k+	Python	Real-time voice agents with WebRTC, MCP support, plugin architecture
Vocode	3k+	Python	Low-level modular voice agent toolkit, telephony integration
Bolna	2k+	Python	Production voice agents with <1s latency, Twilio/Plivo ready
OpenAI Realtime API	—	Multi	Native multimodal streaming, voice-to-voice, function calling
Nimble Pipecat	500+	Python	Lightweight voice agent framework by Daily
NVIDIA Voice Agent	500+	Python	Pipecat-based examples for real-time voice agents
MCP Voice Assistant	300+	Python	Voice assistant powered by MCP, Whisper + ElevenLabs

Full framework comparison →

Telephony Infrastructure

SIP trunking, phone numbers, and call routing for voice AI agents.

Provider	Type	Highlights
Twilio	Commercial	Largest ecosystem, best docs, global coverage, $0.013/min
Telnyx	Commercial	Private IP network, lowest latency, owns infrastructure
Plivo	Commercial	Single-stack (owns telco + AI), 99.99% uptime, HIPAA
Vonage	Commercial	Enterprise-grade, video + voice, conversation API
SignalWire	Commercial	FreeSWITCH creators, AI-native, programmable telco
Bandwidth	Commercial	Direct carrier, 911/E911, enterprise focus

Full telephony comparison →

Voice Analytics

Call analysis, conversation intelligence, and quality assurance.

Tool	Type	Description
Gong	Commercial	Revenue intelligence, call recording, deal insights
Chorus.ai	Commercial	Conversation intelligence for sales teams (ZoomInfo)
Observe.AI	Commercial	Contact center AI, real-time agent assist, QA automation
Callrail	Commercial	Call tracking, conversation analytics, lead attribution
Symbl.ai	Commercial	Real-time conversation intelligence API, sentiment, topics
Hamming AI	Commercial	Voice agent testing platform, automated regression testing

Full analytics comparison →

Voice Commerce

Voice-activated shopping, ordering, and payments.

Tool	Type	Description
Amazon Alexa Shopping	Platform	Voice ordering through Alexa ecosystem
Google Shopping Actions	Platform	Voice commerce through Google Assistant
Ringly.io	Commercial	AI phone agent for e-commerce, abandoned cart recovery
PolyAI	Commercial	Enterprise voice assistants for ordering and booking
SoundHound	Commercial	Voice AI for restaurants, automotive, hospitality

Full voice commerce comparison →

Voice Platforms & Assistants

Major voice assistant ecosystems and their developer tools.

Platform	Type	Developer Resources
Amazon Alexa	Platform	Skills Kit, Voice Service, Smart Home API
Google Assistant	Platform	Actions, Conversational Actions, Home APIs
Apple Siri / SiriKit	Platform	Intents, Shortcuts, App Intents framework
Samsung Bixby	Platform	Capsules, voice actions, SmartThings
OpenClaw	Open Source	Personal AI assistant, 50+ integrations, 210k+ stars
Open WebUI	Open Source	Self-hosted AI with voice, 124k+ stars, offline capable
Home Assistant Voice	Open Source	Privacy-focused voice control for smart home

Full platform comparison →

MCP Servers for Voice

Model Context Protocol (MCP) servers relevant to voice AI applications.

Server	Description	Voice Use Case
Retell AI MCP	Manage Retell AI voice agents via MCP	Build and deploy voice agents from any MCP client
ElevenLabs MCP	Text-to-speech and voice cloning via MCP	Generate speech from any AI assistant
Voice Call MCP	Initiate voice calls via Twilio + OpenAI	AI-initiated phone calls
Whisper MCP	Speech-to-text transcription via MCP	Voice input for any MCP client
Home Assistant MCP	Smart home control via MCP	Voice-controlled smart home
Spotify MCP	Music playback and search via MCP	"Play my playlist" from any voice agent

See Alexa-MCPs for the full 200-server directory of MCP servers optimized for voice assistants.

How Voice AI Architecture Works

graph TD
    A[🗣️ User Speech] --> B[Speech-to-Text<br/>Whisper / Deepgram / AssemblyAI]
    B --> C[AI Agent / LLM<br/>GPT-4o / Claude / Gemini]
    C --> D{Tool Calls}
    D --> E[MCP Servers<br/>Calendar, CRM, Search]
    D --> F[APIs<br/>Twilio, Stripe, etc.]
    E --> C
    F --> C
    C --> G[Text-to-Speech<br/>ElevenLabs / Cartesia / Bark]
    G --> H[🔊 Audio Response]

    style B fill:#4A90D9,stroke:#333,color:#fff
    style C fill:#FF9900,stroke:#333,color:#fff
    style G fill:#28a745,stroke:#333,color:#fff

Market Landscape (2026)

pie title Voice AI Market Segments
    "Voice Agent Platforms" : 30
    "Text-to-Speech" : 20
    "Speech-to-Text" : 18
    "Telephony/SIP" : 12
    "Voice Cloning" : 8
    "Analytics" : 5
    "Commerce" : 4
    "Frameworks" : 3

Contributing

See CONTRIBUTING.md for guidelines. TL;DR:

Fork → Add tool to appropriate section → Submit PR
Or open an issue to suggest a tool

Criteria

Must be a real, working product or actively maintained open-source project
Must be relevant to voice AI (not general AI/ML)
Include a brief, factual description — no marketing copy

Star History

If this list helps you, star it so others can find it too.

Related Lists

Alexa-MCPs — 200 MCP servers for voice assistants
awesome-mcp-servers — 84k+ star MCP directory
awesome-ai-agents-2026 — 300+ AI agent resources

License

MIT — AI Venture Holdings LLC

Built by ALLBOTS.io · A portfolio company of AI Venture Holdings LLC
_{⭐ Star this repo to stay updated as new voice AI tools launch}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
assets		assets
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Why This List Exists

Contents

Voice Agent Platforms

Text-to-Speech

Commercial

Open Source

Speech-to-Text

Commercial APIs

Open Source

Voice Cloning

Frameworks & SDKs

Telephony Infrastructure

Voice Analytics

Voice Commerce

Voice Platforms & Assistants

MCP Servers for Voice

How Voice AI Architecture Works

Market Landscape (2026)

Contributing

Criteria

Star History

Related Lists

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Why This List Exists

Contents

Voice Agent Platforms

Text-to-Speech

Commercial

Open Source

Speech-to-Text

Commercial APIs

Open Source

Voice Cloning

Frameworks & SDKs

Telephony Infrastructure

Voice Analytics

Voice Commerce

Voice Platforms & Assistants

MCP Servers for Voice

How Voice AI Architecture Works

Market Landscape (2026)

Contributing

Criteria

Star History

Related Lists

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Packages