Skip to content

ALLBOTSIO/awesome-voice-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome Voice AI

GitHub Stars 150+ Tools 10 Categories Last Commit MIT License

A curated list of voice AI tools, frameworks, platforms, and models.
Voice agents, text-to-speech, speech-to-text, voice cloning, telephony, and more.

Voice Agents · TTS · STT · Cloning · Frameworks · Telephony · Contribute


Why This List Exists

Voice AI is the fastest-growing segment of the AI stack in 2026. The global Voice Chat API market is projected to reach $3.5B by 2033. Platforms like Retell AI, Vapi, ElevenLabs, and LiveKit are processing billions of minutes monthly. Gartner predicts 40% of enterprise apps will feature AI agents by end of 2026.

But there's no single place to discover all the tools. This list fixes that.


Contents


Voice Agent Platforms

Platforms for building, deploying, and managing AI voice agents.

Tool Type Description Highlights
Retell AI Commercial Developer-friendly voice agent platform with drag-and-drop builder MCP support, Twilio integration, multilingual, real-time workflows
Vapi Commercial Provider-agnostic voice AI orchestration layer 62M monthly calls, 14+ providers, $0.05/min, 99.99% SLA
ElevenLabs Conversational AI Commercial Voice agent platform with industry-leading voice quality Sub-100ms latency, 11,000+ voices, 70+ languages
Bland AI Commercial High-volume outbound voice agent platform Purpose-built for sales campaigns, enterprise telephony
LiveKit Agents Open Source Real-time voice agent framework with WebRTC Fully open source, plugin architecture, MCP support
Synthflow Commercial No-code voice agent builder White-label, 200+ integrations, appointment booking
Cognigy Enterprise Enterprise conversational AI platform Omnichannel, contact center integration, 100+ languages
Lindy AI Commercial AI assistant with voice agent capabilities Multi-step workflows, triggers, CRM integration
Air AI Commercial Autonomous voice agent for sales and customer service 40+ minute conversations, calendar booking
Voiceflow Commercial Collaborative voice & chat AI agent builder Visual builder, team collaboration, multi-channel
Hamming AI Commercial Voice agent testing and evaluation platform Automated QA, regression testing, performance scoring
Inworld AI Commercial Character-driven voice AI for games and enterprise Real-time animation, emotional intelligence, gaming SDK
PlayAI Commercial Voice agent platform with ultra-realistic voices Sub-200ms latency, voice cloning, custom models
Thoughtly Commercial Enterprise voice agent with human-like conversations No-code builder, CRM sync, call analytics

Detailed comparison →


Text-to-Speech

AI-powered speech synthesis — from real-time voices to studio-quality narration.

Commercial

Tool Latency Languages Highlights
ElevenLabs <100ms 70+ Industry-leading quality, voice cloning, 11k+ voices
Play.ht <200ms 60+ Ultra-realistic, voice cloning, API + widget
Deepgram Aura <100ms 10+ Optimized for voice agents, streaming, low cost
Amazon Polly <200ms 30+ AWS integration, SSML, neural voices
Google Cloud TTS <200ms 40+ WaveNet and Neural2 voices, Studio voices
Azure Speech <200ms 100+ Custom Neural Voice, SSML, avatar support
Resemble AI <200ms 25+ Real-time cloning, emotion control, API
Cartesia <80ms 10+ Sonic model, ultra-low latency streaming
Fish Audio <150ms 10+ Open-weight models, voice cloning, multilingual

Open Source

Tool Stars License Highlights
Coqui XTTS 36k+ MPL 2.0 Voice cloning with 6s sample, multilingual, most downloaded on HF
Bark 37k+ MIT Non-verbal sounds, laughter, music, multi-speaker
Piper 7k+ MIT Lightweight, runs on Raspberry Pi, 20+ languages
StyleTTS2 5k+ MIT Studio-quality, style diffusion, human-level naturalness
GPT-SoVITS 40k+ MIT 1-min voice data training, few-shot cloning
OpenVoice 30k+ MIT Instant voice cloning, emotion/accent control
MeloTTS 5k+ MIT High-quality multi-language, real-time CPU inference
Parler-TTS 4k+ Apache 2.0 Text-described voice control, Hugging Face native
MetaVoice 3k+ Apache 2.0 Finetuning in 1 minute, emotional speech
OpenEdAI Speech 2k+ AGPL OpenAI API-compatible TTS server, drop-in replacement
AllTalk TTS 2k+ AGPL Multi-engine TTS server, text-generation-webui support

Full TTS comparison →


Speech-to-Text

Transcription, real-time recognition, and audio understanding.

Commercial APIs

Tool WER Languages Highlights
Deepgram Nova-3 <8% 40+ Fastest streaming, $0.0043/min, voice agent optimized
AssemblyAI (Slam-1) <10% 20+ Speaker diarization, sentiment, content moderation
Gladia <10% 100+ Real-time code-switching, zero-shot language detection
Speechmatics <10% 50+ On-prem option, real-time, batch, translation
Rev AI <5% 30+ Human-verified option, 99% accuracy, legal/medical
Google Cloud STT <12% 125+ Chirp model, medical dictation, streaming
Azure Speech <10% 100+ Custom speech models, real-time, batch
Amazon Transcribe <12% 100+ Medical, call analytics, volume discounts to 67%

Open Source

Tool Stars License Highlights
Whisper 75k+ MIT The standard, 99+ languages, robust in noise
Faster-Whisper 13k+ MIT 4x faster than Whisper, CTranslate2 backend
Whisper.cpp 37k+ MIT C/C++ port, runs on CPU/phone/RPi, WASM support
Vosk 8k+ Apache 2.0 Offline, lightweight, mobile/embedded, 20+ languages
NeMo (NVIDIA) 12k+ Apache 2.0 Enterprise-grade, Conformer/Canary models, GPU optimized
Whisper-Streaming 2k+ MIT Real-time streaming Whisper with local agreement
Insanely-Fast-Whisper 7k+ MIT 150x faster with speculative decoding + batching

Full STT comparison →


Voice Cloning

Clone voices from short audio samples for custom TTS.

Tool Stars Type Highlights
GPT-SoVITS 40k+ Open Source 1-min data training, few-shot, singing + speech
OpenVoice 30k+ Open Source Instant cloning, tone/emotion/accent control
RVC (Retrieval-based VC) 25k+ Open Source Real-time conversion, music community favorite
Coqui XTTS 36k+ Open Source 6-second sample cloning, 17 languages
Bark Voice Clone 3k+ Open Source Bark + voice cloning, easy speaker prompts
ElevenLabs Commercial Professional voice cloning, 29 languages
Resemble AI Commercial Real-time cloning, emotion control, watermarking
Play.ht Commercial Instant cloning, cross-lingual, API access

Full voice cloning comparison →


Frameworks & SDKs

Open-source frameworks for building voice AI applications.

Framework Stars Language Description
Pipecat 8k+ Python Voice + multimodal conversational AI by Daily. STT→LLM→TTS pipelines
LiveKit Agents 5k+ Python Real-time voice agents with WebRTC, MCP support, plugin architecture
Vocode 3k+ Python Low-level modular voice agent toolkit, telephony integration
Bolna 2k+ Python Production voice agents with <1s latency, Twilio/Plivo ready
OpenAI Realtime API Multi Native multimodal streaming, voice-to-voice, function calling
Nimble Pipecat 500+ Python Lightweight voice agent framework by Daily
NVIDIA Voice Agent 500+ Python Pipecat-based examples for real-time voice agents
MCP Voice Assistant 300+ Python Voice assistant powered by MCP, Whisper + ElevenLabs

Full framework comparison →


Telephony Infrastructure

SIP trunking, phone numbers, and call routing for voice AI agents.

Provider Type Highlights
Twilio Commercial Largest ecosystem, best docs, global coverage, $0.013/min
Telnyx Commercial Private IP network, lowest latency, owns infrastructure
Plivo Commercial Single-stack (owns telco + AI), 99.99% uptime, HIPAA
Vonage Commercial Enterprise-grade, video + voice, conversation API
SignalWire Commercial FreeSWITCH creators, AI-native, programmable telco
Bandwidth Commercial Direct carrier, 911/E911, enterprise focus

Full telephony comparison →


Voice Analytics

Call analysis, conversation intelligence, and quality assurance.

Tool Type Description
Gong Commercial Revenue intelligence, call recording, deal insights
Chorus.ai Commercial Conversation intelligence for sales teams (ZoomInfo)
Observe.AI Commercial Contact center AI, real-time agent assist, QA automation
Callrail Commercial Call tracking, conversation analytics, lead attribution
Symbl.ai Commercial Real-time conversation intelligence API, sentiment, topics
Hamming AI Commercial Voice agent testing platform, automated regression testing

Full analytics comparison →


Voice Commerce

Voice-activated shopping, ordering, and payments.

Tool Type Description
Amazon Alexa Shopping Platform Voice ordering through Alexa ecosystem
Google Shopping Actions Platform Voice commerce through Google Assistant
Ringly.io Commercial AI phone agent for e-commerce, abandoned cart recovery
PolyAI Commercial Enterprise voice assistants for ordering and booking
SoundHound Commercial Voice AI for restaurants, automotive, hospitality

Full voice commerce comparison →


Voice Platforms & Assistants

Major voice assistant ecosystems and their developer tools.

Platform Type Developer Resources
Amazon Alexa Platform Skills Kit, Voice Service, Smart Home API
Google Assistant Platform Actions, Conversational Actions, Home APIs
Apple Siri / SiriKit Platform Intents, Shortcuts, App Intents framework
Samsung Bixby Platform Capsules, voice actions, SmartThings
OpenClaw Open Source Personal AI assistant, 50+ integrations, 210k+ stars
Open WebUI Open Source Self-hosted AI with voice, 124k+ stars, offline capable
Home Assistant Voice Open Source Privacy-focused voice control for smart home

Full platform comparison →


MCP Servers for Voice

Model Context Protocol (MCP) servers relevant to voice AI applications.

Server Description Voice Use Case
Retell AI MCP Manage Retell AI voice agents via MCP Build and deploy voice agents from any MCP client
ElevenLabs MCP Text-to-speech and voice cloning via MCP Generate speech from any AI assistant
Voice Call MCP Initiate voice calls via Twilio + OpenAI AI-initiated phone calls
Whisper MCP Speech-to-text transcription via MCP Voice input for any MCP client
Home Assistant MCP Smart home control via MCP Voice-controlled smart home
Spotify MCP Music playback and search via MCP "Play my playlist" from any voice agent

See Alexa-MCPs for the full 200-server directory of MCP servers optimized for voice assistants.


How Voice AI Architecture Works

graph TD
    A[🗣️ User Speech] --> B[Speech-to-Text<br/>Whisper / Deepgram / AssemblyAI]
    B --> C[AI Agent / LLM<br/>GPT-4o / Claude / Gemini]
    C --> D{Tool Calls}
    D --> E[MCP Servers<br/>Calendar, CRM, Search]
    D --> F[APIs<br/>Twilio, Stripe, etc.]
    E --> C
    F --> C
    C --> G[Text-to-Speech<br/>ElevenLabs / Cartesia / Bark]
    G --> H[🔊 Audio Response]

    style B fill:#4A90D9,stroke:#333,color:#fff
    style C fill:#FF9900,stroke:#333,color:#fff
    style G fill:#28a745,stroke:#333,color:#fff
Loading

Market Landscape (2026)

pie title Voice AI Market Segments
    "Voice Agent Platforms" : 30
    "Text-to-Speech" : 20
    "Speech-to-Text" : 18
    "Telephony/SIP" : 12
    "Voice Cloning" : 8
    "Analytics" : 5
    "Commerce" : 4
    "Frameworks" : 3
Loading

Contributing

See CONTRIBUTING.md for guidelines. TL;DR:

  1. Fork → Add tool to appropriate section → Submit PR
  2. Or open an issue to suggest a tool

Criteria

  • Must be a real, working product or actively maintained open-source project
  • Must be relevant to voice AI (not general AI/ML)
  • Include a brief, factual description — no marketing copy

Star History

If this list helps you, star it so others can find it too.

Star History Chart

Related Lists


License

MIT — AI Venture Holdings LLC

Built by ALLBOTS.io · A portfolio company of AI Venture Holdings LLC
⭐ Star this repo to stay updated as new voice AI tools launch

Releases

No releases published

Packages

 
 
 

Contributors