Set of 📝 with 🔗 to help those building Voice AI agents 🎙️🤖
-
Updated
May 25, 2026
Set of 📝 with 🔗 to help those building Voice AI agents 🎙️🤖
🎤💬 Full example of implementing ChatGPT's realtime voice from scratch with VAD + STT + LLM + TTS technology stack within almost one file!
Real-time voice agents with parallel async background sub-agents — conversations continue naturally while tasks run • Join the builders → https://discord.gg/mqxKaN3UKC
Open-source realtime voice agent server in Go with WebRTC (WHIP), barge-in, streaming STT/LLM/TTS pipelines, plugin system, multi-language SDKs, SIP telephony, ESP32 support & fully local mode.
An AI-powered object detection system using YOLOv8 to identify and locate graffiti across various contexts including walls, buildings, over-bridges, vehicles, and other surfaces.
LiveKit voice app validation skill. Use when building, debugging, or declaring working any LiveKit voice agent, Agents UI app, or React/Next.js LiveKit project. Enforces evidence-based validation before reporting a session, token endpoint, worker, transcript, or end-to-end voice interaction as complete.
Gemini Live API voice tutor for K-12 NCERT math — Hindi/English, hand-drawn whiteboard, open source
Developer-facing interface for discovering and calling the Livepeer network.
Bounded-latency browser edge inference pipeline for real-time voice interview summarization using ONNX Runtime Web + WASM. Features Web Worker isolation, semantic ring buffers, latest-only concurrency control, observability dashboard, offline-first architecture and production-ready whisper.cpp upgrade path.
Voice agent prototype for structured clinical interviewing, with VAD-based interruption handling, modular ASR/LLM/TTS backends, and dialogue workflow control.
Real-time hand sign recognition using LSTM-based models for sequence detection from video frames.
Real-time voice interface for OpenClaw. Stream speech-to-text, LLM reasoning, and text-to-speech into a low-latency conversational agent you can talk to—locally or in the cloud.
LiveKit Agents UI demo showing a voice AI assistant that schedules roof inspections using real-time voice interaction, visualizers, and booking workflow.
A real-time (<500ms) voice AI concierge built with Next.js, FastAPI, and Gemini 2.5 Flash Lite. Features local RAG (ChromaDB) for policy retrieval, Tool Calling for live booking, and event-driven CRM logging to Google Sheets.
howeverpipecat: engineering-focused Pipecat distribution
Realtime multimodal AI agent with voice streaming, RAG memory, and autonomous workflows
Traffyx-AI — Traffic Forecasting & Urban Mobility Intelligence System Applied machine learning system for traffic prediction, congestion analysis, and real-world spatiotemporal data modeling.
High-performance async Python backend for real-time AI conversations with Quart, Supabase, and OpenAI.
Production-ready real-time voice AI pipeline integrating Twilio Media Streams, streaming ASR (Deepgram), LLM reasoning, and live analytics dashboard. Designed for ultra-low latency conversational intelligence in call center and healthcare environments.
Realtime voice AI gateway with turn state, interruption handling, provider fallback, degraded state, audit events, runtime evals, Bun, and TypeScript.
Add a description, image, and links to the realtime-ai topic page so that developers can more easily learn about it.
To associate your repository with the realtime-ai topic, visit your repo's landing page and select "manage topics."