A curated list of voice AI tools, frameworks, platforms, and models.
Voice agents, text-to-speech, speech-to-text, voice cloning, telephony, and more.
Voice Agents · TTS · STT · Cloning · Frameworks · Telephony · Contribute
Voice AI is the fastest-growing segment of the AI stack in 2026. The global Voice Chat API market is projected to reach $3.5B by 2033. Platforms like Retell AI, Vapi, ElevenLabs, and LiveKit are processing billions of minutes monthly. Gartner predicts 40% of enterprise apps will feature AI agents by end of 2026.
But there's no single place to discover all the tools. This list fixes that.
- Voice Agent Platforms — Build and deploy voice AI agents
- Text-to-Speech — AI speech synthesis and voice generation
- Speech-to-Text — Transcription, recognition, and understanding
- Voice Cloning — Clone and generate custom voices
- Frameworks & SDKs — Open-source voice AI frameworks
- Telephony Infrastructure — SIP, phone numbers, call routing
- Voice Analytics — Call analysis, sentiment, and QA
- Voice Commerce — Voice-activated shopping and payments
- Voice Platforms & Assistants — Alexa, Siri, Google ecosystem
- MCP Servers for Voice — Model Context Protocol voice integrations
Platforms for building, deploying, and managing AI voice agents.
| Tool | Type | Description | Highlights |
|---|---|---|---|
| Retell AI | Commercial | Developer-friendly voice agent platform with drag-and-drop builder | MCP support, Twilio integration, multilingual, real-time workflows |
| Vapi | Commercial | Provider-agnostic voice AI orchestration layer | 62M monthly calls, 14+ providers, $0.05/min, 99.99% SLA |
| ElevenLabs Conversational AI | Commercial | Voice agent platform with industry-leading voice quality | Sub-100ms latency, 11,000+ voices, 70+ languages |
| Bland AI | Commercial | High-volume outbound voice agent platform | Purpose-built for sales campaigns, enterprise telephony |
| LiveKit Agents | Open Source | Real-time voice agent framework with WebRTC | Fully open source, plugin architecture, MCP support |
| Synthflow | Commercial | No-code voice agent builder | White-label, 200+ integrations, appointment booking |
| Cognigy | Enterprise | Enterprise conversational AI platform | Omnichannel, contact center integration, 100+ languages |
| Lindy AI | Commercial | AI assistant with voice agent capabilities | Multi-step workflows, triggers, CRM integration |
| Air AI | Commercial | Autonomous voice agent for sales and customer service | 40+ minute conversations, calendar booking |
| Voiceflow | Commercial | Collaborative voice & chat AI agent builder | Visual builder, team collaboration, multi-channel |
| Hamming AI | Commercial | Voice agent testing and evaluation platform | Automated QA, regression testing, performance scoring |
| Inworld AI | Commercial | Character-driven voice AI for games and enterprise | Real-time animation, emotional intelligence, gaming SDK |
| PlayAI | Commercial | Voice agent platform with ultra-realistic voices | Sub-200ms latency, voice cloning, custom models |
| Thoughtly | Commercial | Enterprise voice agent with human-like conversations | No-code builder, CRM sync, call analytics |
AI-powered speech synthesis — from real-time voices to studio-quality narration.
| Tool | Latency | Languages | Highlights |
|---|---|---|---|
| ElevenLabs | <100ms | 70+ | Industry-leading quality, voice cloning, 11k+ voices |
| Play.ht | <200ms | 60+ | Ultra-realistic, voice cloning, API + widget |
| Deepgram Aura | <100ms | 10+ | Optimized for voice agents, streaming, low cost |
| Amazon Polly | <200ms | 30+ | AWS integration, SSML, neural voices |
| Google Cloud TTS | <200ms | 40+ | WaveNet and Neural2 voices, Studio voices |
| Azure Speech | <200ms | 100+ | Custom Neural Voice, SSML, avatar support |
| Resemble AI | <200ms | 25+ | Real-time cloning, emotion control, API |
| Cartesia | <80ms | 10+ | Sonic model, ultra-low latency streaming |
| Fish Audio | <150ms | 10+ | Open-weight models, voice cloning, multilingual |
| Tool | Stars | License | Highlights |
|---|---|---|---|
| Coqui XTTS | 36k+ | MPL 2.0 | Voice cloning with 6s sample, multilingual, most downloaded on HF |
| Bark | 37k+ | MIT | Non-verbal sounds, laughter, music, multi-speaker |
| Piper | 7k+ | MIT | Lightweight, runs on Raspberry Pi, 20+ languages |
| StyleTTS2 | 5k+ | MIT | Studio-quality, style diffusion, human-level naturalness |
| GPT-SoVITS | 40k+ | MIT | 1-min voice data training, few-shot cloning |
| OpenVoice | 30k+ | MIT | Instant voice cloning, emotion/accent control |
| MeloTTS | 5k+ | MIT | High-quality multi-language, real-time CPU inference |
| Parler-TTS | 4k+ | Apache 2.0 | Text-described voice control, Hugging Face native |
| MetaVoice | 3k+ | Apache 2.0 | Finetuning in 1 minute, emotional speech |
| OpenEdAI Speech | 2k+ | AGPL | OpenAI API-compatible TTS server, drop-in replacement |
| AllTalk TTS | 2k+ | AGPL | Multi-engine TTS server, text-generation-webui support |
Transcription, real-time recognition, and audio understanding.
| Tool | WER | Languages | Highlights |
|---|---|---|---|
| Deepgram Nova-3 | <8% | 40+ | Fastest streaming, $0.0043/min, voice agent optimized |
| AssemblyAI (Slam-1) | <10% | 20+ | Speaker diarization, sentiment, content moderation |
| Gladia | <10% | 100+ | Real-time code-switching, zero-shot language detection |
| Speechmatics | <10% | 50+ | On-prem option, real-time, batch, translation |
| Rev AI | <5% | 30+ | Human-verified option, 99% accuracy, legal/medical |
| Google Cloud STT | <12% | 125+ | Chirp model, medical dictation, streaming |
| Azure Speech | <10% | 100+ | Custom speech models, real-time, batch |
| Amazon Transcribe | <12% | 100+ | Medical, call analytics, volume discounts to 67% |
| Tool | Stars | License | Highlights |
|---|---|---|---|
| Whisper | 75k+ | MIT | The standard, 99+ languages, robust in noise |
| Faster-Whisper | 13k+ | MIT | 4x faster than Whisper, CTranslate2 backend |
| Whisper.cpp | 37k+ | MIT | C/C++ port, runs on CPU/phone/RPi, WASM support |
| Vosk | 8k+ | Apache 2.0 | Offline, lightweight, mobile/embedded, 20+ languages |
| NeMo (NVIDIA) | 12k+ | Apache 2.0 | Enterprise-grade, Conformer/Canary models, GPU optimized |
| Whisper-Streaming | 2k+ | MIT | Real-time streaming Whisper with local agreement |
| Insanely-Fast-Whisper | 7k+ | MIT | 150x faster with speculative decoding + batching |
Clone voices from short audio samples for custom TTS.
| Tool | Stars | Type | Highlights |
|---|---|---|---|
| GPT-SoVITS | 40k+ | Open Source | 1-min data training, few-shot, singing + speech |
| OpenVoice | 30k+ | Open Source | Instant cloning, tone/emotion/accent control |
| RVC (Retrieval-based VC) | 25k+ | Open Source | Real-time conversion, music community favorite |
| Coqui XTTS | 36k+ | Open Source | 6-second sample cloning, 17 languages |
| Bark Voice Clone | 3k+ | Open Source | Bark + voice cloning, easy speaker prompts |
| ElevenLabs | — | Commercial | Professional voice cloning, 29 languages |
| Resemble AI | — | Commercial | Real-time cloning, emotion control, watermarking |
| Play.ht | — | Commercial | Instant cloning, cross-lingual, API access |
Open-source frameworks for building voice AI applications.
| Framework | Stars | Language | Description |
|---|---|---|---|
| Pipecat | 8k+ | Python | Voice + multimodal conversational AI by Daily. STT→LLM→TTS pipelines |
| LiveKit Agents | 5k+ | Python | Real-time voice agents with WebRTC, MCP support, plugin architecture |
| Vocode | 3k+ | Python | Low-level modular voice agent toolkit, telephony integration |
| Bolna | 2k+ | Python | Production voice agents with <1s latency, Twilio/Plivo ready |
| OpenAI Realtime API | — | Multi | Native multimodal streaming, voice-to-voice, function calling |
| Nimble Pipecat | 500+ | Python | Lightweight voice agent framework by Daily |
| NVIDIA Voice Agent | 500+ | Python | Pipecat-based examples for real-time voice agents |
| MCP Voice Assistant | 300+ | Python | Voice assistant powered by MCP, Whisper + ElevenLabs |
SIP trunking, phone numbers, and call routing for voice AI agents.
| Provider | Type | Highlights |
|---|---|---|
| Twilio | Commercial | Largest ecosystem, best docs, global coverage, $0.013/min |
| Telnyx | Commercial | Private IP network, lowest latency, owns infrastructure |
| Plivo | Commercial | Single-stack (owns telco + AI), 99.99% uptime, HIPAA |
| Vonage | Commercial | Enterprise-grade, video + voice, conversation API |
| SignalWire | Commercial | FreeSWITCH creators, AI-native, programmable telco |
| Bandwidth | Commercial | Direct carrier, 911/E911, enterprise focus |
Call analysis, conversation intelligence, and quality assurance.
| Tool | Type | Description |
|---|---|---|
| Gong | Commercial | Revenue intelligence, call recording, deal insights |
| Chorus.ai | Commercial | Conversation intelligence for sales teams (ZoomInfo) |
| Observe.AI | Commercial | Contact center AI, real-time agent assist, QA automation |
| Callrail | Commercial | Call tracking, conversation analytics, lead attribution |
| Symbl.ai | Commercial | Real-time conversation intelligence API, sentiment, topics |
| Hamming AI | Commercial | Voice agent testing platform, automated regression testing |
Voice-activated shopping, ordering, and payments.
| Tool | Type | Description |
|---|---|---|
| Amazon Alexa Shopping | Platform | Voice ordering through Alexa ecosystem |
| Google Shopping Actions | Platform | Voice commerce through Google Assistant |
| Ringly.io | Commercial | AI phone agent for e-commerce, abandoned cart recovery |
| PolyAI | Commercial | Enterprise voice assistants for ordering and booking |
| SoundHound | Commercial | Voice AI for restaurants, automotive, hospitality |
Major voice assistant ecosystems and their developer tools.
| Platform | Type | Developer Resources |
|---|---|---|
| Amazon Alexa | Platform | Skills Kit, Voice Service, Smart Home API |
| Google Assistant | Platform | Actions, Conversational Actions, Home APIs |
| Apple Siri / SiriKit | Platform | Intents, Shortcuts, App Intents framework |
| Samsung Bixby | Platform | Capsules, voice actions, SmartThings |
| OpenClaw | Open Source | Personal AI assistant, 50+ integrations, 210k+ stars |
| Open WebUI | Open Source | Self-hosted AI with voice, 124k+ stars, offline capable |
| Home Assistant Voice | Open Source | Privacy-focused voice control for smart home |
Model Context Protocol (MCP) servers relevant to voice AI applications.
| Server | Description | Voice Use Case |
|---|---|---|
| Retell AI MCP | Manage Retell AI voice agents via MCP | Build and deploy voice agents from any MCP client |
| ElevenLabs MCP | Text-to-speech and voice cloning via MCP | Generate speech from any AI assistant |
| Voice Call MCP | Initiate voice calls via Twilio + OpenAI | AI-initiated phone calls |
| Whisper MCP | Speech-to-text transcription via MCP | Voice input for any MCP client |
| Home Assistant MCP | Smart home control via MCP | Voice-controlled smart home |
| Spotify MCP | Music playback and search via MCP | "Play my playlist" from any voice agent |
See Alexa-MCPs for the full 200-server directory of MCP servers optimized for voice assistants.
graph TD
A[🗣️ User Speech] --> B[Speech-to-Text<br/>Whisper / Deepgram / AssemblyAI]
B --> C[AI Agent / LLM<br/>GPT-4o / Claude / Gemini]
C --> D{Tool Calls}
D --> E[MCP Servers<br/>Calendar, CRM, Search]
D --> F[APIs<br/>Twilio, Stripe, etc.]
E --> C
F --> C
C --> G[Text-to-Speech<br/>ElevenLabs / Cartesia / Bark]
G --> H[🔊 Audio Response]
style B fill:#4A90D9,stroke:#333,color:#fff
style C fill:#FF9900,stroke:#333,color:#fff
style G fill:#28a745,stroke:#333,color:#fff
pie title Voice AI Market Segments
"Voice Agent Platforms" : 30
"Text-to-Speech" : 20
"Speech-to-Text" : 18
"Telephony/SIP" : 12
"Voice Cloning" : 8
"Analytics" : 5
"Commerce" : 4
"Frameworks" : 3
See CONTRIBUTING.md for guidelines. TL;DR:
- Fork → Add tool to appropriate section → Submit PR
- Or open an issue to suggest a tool
- Must be a real, working product or actively maintained open-source project
- Must be relevant to voice AI (not general AI/ML)
- Include a brief, factual description — no marketing copy
If this list helps you, star it so others can find it too.
- Alexa-MCPs — 200 MCP servers for voice assistants
- awesome-mcp-servers — 84k+ star MCP directory
- awesome-ai-agents-2026 — 300+ AI agent resources
MIT — AI Venture Holdings LLC
Built by ALLBOTS.io · A portfolio company of AI Venture Holdings LLC
⭐ Star this repo to stay updated as new voice AI tools launch