SDK Reference documentation | Package (PyPI)
This folder contains Python samples demonstrating how to build real-time voice assistants using Azure AI Speech VoiceLive service. Each sample is self-contained for easy understanding and deployment.
Demonstrates the new Voice Live + Foundry Agent v2 workflow, including creating a Voice Live-configured agent and running an agent-connected voice assistant.
Key Features:
- Agent creation utility with Voice Live metadata chunking
- New SDK-based agent session configuration (
AgentSessionConfig) - Proactive greeting and barge-in handling
- Conversation logging
Demonstrates connecting to an Azure AI Foundry agent for voice conversations. The agent handles model selection, instructions, and tools, with support for proactive greetings.
Key Features:
- Azure AI Foundry agent integration
- Proactive greeting support
- Azure authentication (required)
- Agent-managed tools and instructions
Demonstrates direct integration with VoiceLive models for voice conversations without agent overhead.
Key Features:
- Direct model access
- Flexible authentication (API key or Azure credentials)
- Custom instructions support
- Model selection options
Demonstrates direct integration with VoiceLive using bring-your-own-models from Foundry.
Key Features:
- Bring-Your-Own-Model Integration: Connects direct to a self hosted model
- Proactive Greeting: Agent initiates the conversation with a welcome message
- Custom Instructions: Define your own system instructions for the AI
- Flexible Authentication: Supports both API key and Azure credential authentication
Demonstrates how to implement function calling with VoiceLive models, enabling the AI to execute custom functions during conversations.
Key Features:
- Custom function definitions
- Real-time function execution
- Function result handling
- Advanced tool integration
- Proactive greeting support
Demonstrates how to build a real-time voice assistant with Retrieval-Augmented Generation (RAG) capabilities using Azure AI Voice Live API and Azure AI Search.
Key Features:
- Real-time speech-to-speech interaction powered by Voice live
- RAG integration with Azure AI Search for document retrieval
- Full-stack architecture (React/TypeScript frontend + FastAPI backend)
- Azure AI Foundry Agent Service integration
- Production-ready
azddeployment to Azure Container Apps
A Dockerized sample demonstrating Azure Voice Live API with avatar integration, with the Voice Live SDK running entirely on the server side (Python/FastAPI) while the browser handles UI, audio capture/playback, and avatar video rendering.
Key Features:
- Avatar-enabled voice conversations with server-side SDK
- Prebuilt, custom, and photo avatar character support
- WebRTC and WebSocket avatar output modes
- Live scene settings adjustment for photo avatars
- Proactive greeting and barge-in support
- Barge-in support for natural conversation interruption
- Docker-based deployment
- Azure Container Apps deployment guide
- Developer mode for debugging
All samples require:
- Python 3.8+
- Audio input/output devices (microphone and speakers)
- Azure subscription - Create one for free
Depending on which sample you want to run:
For Agent Quickstart and Agents New Quickstart:
- Azure AI Foundry project with a deployed agent
- Azure CLI for authentication
For Model Quickstart, BYOM Quickstart, and Function Calling:
- AI Foundry resource
- API key or Azure CLI for authentication
-
Navigate to the quickstarts folder:
cd python/voice-live-quickstarts -
Create a virtual environment (recommended):
python -m venv .venv # On Windows .venv\Scripts\activate # On Linux/macOS source .venv/bin/activate
-
Install dependencies:
pip install -r requirements.txt
-
Configure environment variables:
- Copy
.env_sampleto.env - Update
.envwith your Azure credentials
- Copy
-
Run a sample:
# New v2 agent quickstart python AgentsNewQuickstart/voice-live-with-agent-v2.py # or create an agent configured for Voice Live python AgentsNewQuickstart/create_agent_v2_with_voicelive.py # or classic agent quickstart python agents-quickstart.py # or python model-quickstart.py # or python bring-your-own-model-quickstart.py # or python function-calling-quickstart.py
Agent Quickstart and Agents New Quickstart require Azure authentication:
az login
python agents-quickstart.py
# or
python AgentsNewQuickstart/voice-live-with-agent-v2.pyModel Quickstart, BYOM Quickstart, and Function Calling support both methods:
# With API key (from .env file)
python model-quickstart.py
# or
python bring-your-own-model-quickstart.py
# With Azure credentials
az login
python model-quickstart.py --use-token-credential
# or
python bring-your-own-model-quickstart.py --use-token-credentialAll samples use a .env file for configuration. Copy .env_sample to .env and update with your values:
AZURE_VOICELIVE_ENDPOINT=https://your-endpoint.services.ai.azure.com/
AZURE_VOICELIVE_PROJECT_NAME=your-project-name
AZURE_VOICELIVE_AGENT_ID=asst_your-agent-id
AZURE_VOICELIVE_API_VERSION=2025-10-01
# AZURE_VOICELIVE_API_KEY not needed for agents (Azure auth only)
AZURE_VOICELIVE_ENDPOINT=https://your-endpoint.services.ai.azure.com/
AZURE_VOICELIVE_API_KEY=your-api-key
AZURE_VOICELIVE_API_VERSION=2025-10-01
AZURE_VOICELIVE_ENDPOINT=https://your-endpoint.services.ai.azure.com/
AZURE_VOICELIVE_API_KEY=your-api-key
AZURE_VOICELIVE_API_VERSION=2025-10-01
All samples demonstrate:
- Real-time Voice: Bidirectional audio streaming for natural conversations
- Audio Processing: Microphone capture and speaker playback using PyAudio
- Interruption Handling: Support for natural turn-taking in conversations
- Resource Management: Proper cleanup of connections and audio resources
- Async/Await: Modern Python async programming patterns
Popular neural voice options include:
en-US-AvaNeural- Female, conversationalen-US-AndrewNeural- Male, conversationalen-US-JennyNeural- Female, friendlyen-US-GuyNeural- Male, professionalen-US-AriaNeural- Female, cheerfulen-US-DavisNeural- Male, calm
See the Azure Neural Voice Gallery for all available voices.
User Voice → Microphone → PyAudio → VoiceLive SDK → Azure AI Foundry Agent
↓
User Hears ← Speakers ← PyAudio ← VoiceLive SDK ← Agent Response
User Voice → Microphone → PyAudio → VoiceLive SDK → Azure AI Model (GPT-4o)
↓
User Hears ← Speakers ← PyAudio ← VoiceLive SDK ← Model Response
User Voice → Microphone → PyAudio → VoiceLive SDK → Azure AI Model
↓
Function Call Request
↓
Execute Python Function
↓
Function Result → Model
↓
User Hears ← Speakers ← PyAudio ← VoiceLive SDK ← Enhanced Response
- No audio input/output: Verify your microphone and speakers are working and set as default devices
- PyAudio installation errors:
- On Windows: Install via
pip install pyaudio - On Linux:
sudo apt-get install python3-pyaudioorpip install pyaudio - On macOS:
brew install portaudio && pip install pyaudio
- On Windows: Install via
- Audio device busy: Close other applications using your audio devices (e.g., Teams, Zoom)
- Poor audio quality: Update your audio drivers to the latest version
- 401 Unauthorized:
- For API key: Verify
AZURE_VOICELIVE_API_KEYin your.envfile - For Azure auth: Run
az loginto authenticate with Azure CLI
- For API key: Verify
- Agent not found (Agent sample): Check your agent ID format (should be
asst_xxxxx) and project name - Token credential fails: Ensure Azure CLI is installed and you're logged in
- Insufficient permissions (Agent sample): Verify your Azure account has access to the AI Foundry project
- Endpoint errors: Verify your endpoint URL format in
.env:https://your-endpoint.services.ai.azure.com/ - WebSocket timeout: Check your network connection and firewall settings
- Certificate errors: Ensure your system certificates are up to date
- Model not available (Model/Function samples): Verify your Speech resource has VoiceLive enabled
- Module not found: Run
pip install -r requirements.txtto install dependencies - Python version: Verify Python 3.8 or later is installed:
python --version - Virtual environment: Use a virtual environment to avoid package conflicts
- Import errors: Ensure you're in the correct directory and virtual environment is activated
All samples support these options (use --help for full details):
--endpoint: Azure VoiceLive endpoint URL--voice: Voice for the assistant (default varies by sample)--verboseor-v: Enable detailed logging
Agent-specific options:
--agent-id: Azure AI Foundry agent ID--project-name: Azure AI Foundry project name
Model/Function-specific options:
--api-key: Azure VoiceLive API key--model: VoiceLive model to use--use-token-credential: Use Azure authentication instead of API key
The samples use the following Python packages (defined in requirements.txt):
azure-ai-voicelive[aiohttp]
pyaudio
python-dotenv
azure-identity
Install all dependencies with:
pip install -r requirements.txtWe welcome contributions! Please see the Support Guide for details on how to contribute.
This project is licensed under the MIT License - see the LICENSE file for details.