Python Voice Assistant Samples

SDK Reference documentation | Package (PyPI)

This folder contains Python samples demonstrating how to build real-time voice assistants using Azure AI Speech VoiceLive service. Each sample is self-contained for easy understanding and deployment.

Available Samples

Agents New Quickstart

Demonstrates the new Voice Live + Foundry Agent v2 workflow, including creating a Voice Live-configured agent and running an agent-connected voice assistant.

Key Features:

Agent creation utility with Voice Live metadata chunking
New SDK-based agent session configuration (AgentSessionConfig)
Proactive greeting and barge-in handling
Conversation logging

Agent Quickstart

Demonstrates connecting to an Azure AI Foundry agent for voice conversations. The agent handles model selection, instructions, and tools, with support for proactive greetings.

Key Features:

Azure AI Foundry agent integration
Proactive greeting support
Azure authentication (required)
Agent-managed tools and instructions

Model Quickstart

Demonstrates direct integration with VoiceLive models for voice conversations without agent overhead.

Key Features:

Direct model access
Flexible authentication (API key or Azure credentials)
Custom instructions support
Model selection options

Bring-Your-Own-Model (BYOM) Quickstart

Demonstrates direct integration with VoiceLive using bring-your-own-models from Foundry.

Key Features:

Bring-Your-Own-Model Integration: Connects direct to a self hosted model
Proactive Greeting: Agent initiates the conversation with a welcome message
Custom Instructions: Define your own system instructions for the AI
Flexible Authentication: Supports both API key and Azure credential authentication

Function Calling Quickstart

Demonstrates how to implement function calling with VoiceLive models, enabling the AI to execute custom functions during conversations.

Key Features:

Custom function definitions
Real-time function execution
Function result handling
Advanced tool integration
Proactive greeting support

RAG-enabled Voice Assistant

Demonstrates how to build a real-time voice assistant with Retrieval-Augmented Generation (RAG) capabilities using Azure AI Voice Live API and Azure AI Search.

Key Features:

Real-time speech-to-speech interaction powered by Voice live
RAG integration with Azure AI Search for document retrieval
Full-stack architecture (React/TypeScript frontend + FastAPI backend)
Azure AI Foundry Agent Service integration
Production-ready azd deployment to Azure Container Apps

A Dockerized sample demonstrating Azure Voice Live API with avatar integration, with the Voice Live SDK running entirely on the server side (Python/FastAPI) while the browser handles UI, audio capture/playback, and avatar video rendering.

Key Features:

Avatar-enabled voice conversations with server-side SDK
Prebuilt, custom, and photo avatar character support
WebRTC and WebSocket avatar output modes
Live scene settings adjustment for photo avatars
Proactive greeting and barge-in support
Barge-in support for natural conversation interruption
Docker-based deployment
Azure Container Apps deployment guide
Developer mode for debugging

Prerequisites

All samples require:

Python 3.8+
Audio input/output devices (microphone and speakers)
Azure subscription - Create one for free

Azure Resources

Depending on which sample you want to run:

For Agent Quickstart and Agents New Quickstart:

Azure AI Foundry project with a deployed agent
Azure CLI for authentication

For Model Quickstart, BYOM Quickstart, and Function Calling:

AI Foundry resource
API key or Azure CLI for authentication

Getting Started

Quick Start

Navigate to the quickstarts folder:
```
cd python/voice-live-quickstarts
```

Create a virtual environment (recommended):

python -m venv .venv

# On Windows
.venv\Scripts\activate

# On Linux/macOS
source .venv/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```
Configure environment variables:
- Copy .env_sample to .env
- Update .env with your Azure credentials

Run a sample:

# New v2 agent quickstart
python AgentsNewQuickstart/voice-live-with-agent-v2.py
# or create an agent configured for Voice Live
python AgentsNewQuickstart/create_agent_v2_with_voicelive.py
# or classic agent quickstart
python agents-quickstart.py
# or
python model-quickstart.py
# or
python bring-your-own-model-quickstart.py
# or
python function-calling-quickstart.py

Authentication

Agent Quickstart and Agents New Quickstart require Azure authentication:

az login
python agents-quickstart.py
# or
python AgentsNewQuickstart/voice-live-with-agent-v2.py

Model Quickstart, BYOM Quickstart, and Function Calling support both methods:

# With API key (from .env file)
python model-quickstart.py
# or
python bring-your-own-model-quickstart.py

# With Azure credentials
az login
python model-quickstart.py --use-token-credential
# or
python bring-your-own-model-quickstart.py --use-token-credential

Configuration

All samples use a .env file for configuration. Copy .env_sample to .env and update with your values:

Agent Quickstart Configuration

AZURE_VOICELIVE_ENDPOINT=https://your-endpoint.services.ai.azure.com/
AZURE_VOICELIVE_PROJECT_NAME=your-project-name
AZURE_VOICELIVE_AGENT_ID=asst_your-agent-id
AZURE_VOICELIVE_API_VERSION=2025-10-01
# AZURE_VOICELIVE_API_KEY not needed for agents (Azure auth only)

Model Quickstart Configuration

AZURE_VOICELIVE_ENDPOINT=https://your-endpoint.services.ai.azure.com/
AZURE_VOICELIVE_API_KEY=your-api-key
AZURE_VOICELIVE_API_VERSION=2025-10-01

Function Calling Configuration

AZURE_VOICELIVE_ENDPOINT=https://your-endpoint.services.ai.azure.com/
AZURE_VOICELIVE_API_KEY=your-api-key
AZURE_VOICELIVE_API_VERSION=2025-10-01

Common Features

All samples demonstrate:

Real-time Voice: Bidirectional audio streaming for natural conversations
Audio Processing: Microphone capture and speaker playback using PyAudio
Interruption Handling: Support for natural turn-taking in conversations
Resource Management: Proper cleanup of connections and audio resources
Async/Await: Modern Python async programming patterns

Available Voices

Popular neural voice options include:

en-US-AvaNeural - Female, conversational
en-US-AndrewNeural - Male, conversational
en-US-JennyNeural - Female, friendly
en-US-GuyNeural - Male, professional
en-US-AriaNeural - Female, cheerful
en-US-DavisNeural - Male, calm

See the Azure Neural Voice Gallery for all available voices.

Architecture

Agent Quickstart Flow

User Voice → Microphone → PyAudio → VoiceLive SDK → Azure AI Foundry Agent
                                                              ↓
User Hears ← Speakers ← PyAudio ← VoiceLive SDK ← Agent Response

Model Quickstart Flow

User Voice → Microphone → PyAudio → VoiceLive SDK → Azure AI Model (GPT-4o)
                                                              ↓
User Hears ← Speakers ← PyAudio ← VoiceLive SDK ← Model Response

Function Calling Flow

User Voice → Microphone → PyAudio → VoiceLive SDK → Azure AI Model
                                                              ↓
                                                    Function Call Request
                                                              ↓
                                            Execute Python Function
                                                              ↓
                                            Function Result → Model
                                                              ↓
User Hears ← Speakers ← PyAudio ← VoiceLive SDK ← Enhanced Response

Troubleshooting

Audio Issues

No audio input/output: Verify your microphone and speakers are working and set as default devices
PyAudio installation errors:
- On Windows: Install via pip install pyaudio
- On Linux: sudo apt-get install python3-pyaudio or pip install pyaudio
- On macOS: brew install portaudio && pip install pyaudio
Audio device busy: Close other applications using your audio devices (e.g., Teams, Zoom)
Poor audio quality: Update your audio drivers to the latest version

Authentication Issues

401 Unauthorized:
- For API key: Verify AZURE_VOICELIVE_API_KEY in your .env file
- For Azure auth: Run az login to authenticate with Azure CLI
Agent not found (Agent sample): Check your agent ID format (should be asst_xxxxx) and project name
Token credential fails: Ensure Azure CLI is installed and you're logged in
Insufficient permissions (Agent sample): Verify your Azure account has access to the AI Foundry project

Connection Issues

Endpoint errors: Verify your endpoint URL format in .env: https://your-endpoint.services.ai.azure.com/
WebSocket timeout: Check your network connection and firewall settings
Certificate errors: Ensure your system certificates are up to date
Model not available (Model/Function samples): Verify your Speech resource has VoiceLive enabled

Python Environment Issues

Module not found: Run pip install -r requirements.txt to install dependencies
Python version: Verify Python 3.8 or later is installed: python --version
Virtual environment: Use a virtual environment to avoid package conflicts
Import errors: Ensure you're in the correct directory and virtual environment is activated

Common Command Line Options

All samples support these options (use --help for full details):

--endpoint: Azure VoiceLive endpoint URL
--voice: Voice for the assistant (default varies by sample)
--verbose or -v: Enable detailed logging

Agent-specific options:

--agent-id: Azure AI Foundry agent ID
--project-name: Azure AI Foundry project name

Model/Function-specific options:

--api-key: Azure VoiceLive API key
--model: VoiceLive model to use
--use-token-credential: Use Azure authentication instead of API key

Requirements

The samples use the following Python packages (defined in requirements.txt):

azure-ai-voicelive[aiohttp]
pyaudio
python-dotenv
azure-identity

Install all dependencies with:

pip install -r requirements.txt

Additional Resources

Contributing

We welcome contributions! Please see the Support Guide for details on how to contribute.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python Voice Assistant Samples

Available Samples

Agents New Quickstart

Agent Quickstart

Model Quickstart

Bring-Your-Own-Model (BYOM) Quickstart

Function Calling Quickstart

RAG-enabled Voice Assistant

Voice Live Avatar

Prerequisites

Azure Resources

Getting Started

Quick Start

Authentication

Configuration

Agent Quickstart Configuration

Model Quickstart Configuration

Function Calling Configuration

Common Features

Available Voices

Architecture

Agent Quickstart Flow

Model Quickstart Flow

Function Calling Flow

Troubleshooting

Audio Issues

Authentication Issues

Connection Issues

Python Environment Issues

Common Command Line Options

Requirements

Additional Resources

Contributing

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Python Voice Assistant Samples

Available Samples

Agents New Quickstart

Agent Quickstart

Model Quickstart

Bring-Your-Own-Model (BYOM) Quickstart

Function Calling Quickstart

RAG-enabled Voice Assistant

Voice Live Avatar

Prerequisites

Azure Resources

Getting Started

Quick Start

Authentication

Configuration

Agent Quickstart Configuration

Model Quickstart Configuration

Function Calling Configuration

Common Features

Available Voices

Architecture

Agent Quickstart Flow

Model Quickstart Flow

Function Calling Flow

Troubleshooting

Audio Issues

Authentication Issues

Connection Issues

Python Environment Issues

Common Command Line Options

Requirements

Additional Resources

Contributing

License