Amazon IVS Python Demo Scripts

A comprehensive collection of Python demo scripts demonstrating various Amazon IVS (Interactive Video Service) capabilities across both Real-Time Stages and Channels (low-latency HLS). This project showcases publishing, subscribing, transcription, AI video analysis, AI-powered speech-to-speech, and timed metadata publishing functionality.

This project is intended for education purposes only and not for production usage.

Overview

This project demonstrates how to integrate Amazon IVS services with various AI and media processing capabilities:

IVS Real-Time Stages (WebRTC)

WebRTC Publishing: Stream video/audio content to IVS stages
WebRTC Subscribing: Receive and process streams from IVS stages
AI Speech-to-Speech: Integrate Amazon Nova Sonic for conversational AI
SEI Publishing: Embed metadata directly into H.264 video streams using SEI NAL units
Event Handling: Process real-time stage events via WebSocket connections
Audio Visualization: Generate dynamic audio visualizations

IVS Channels (Low-Latency HLS)

Channel Subscription: Subscribe to and analyze IVS channel streams
Frame Analysis: AI-powered video frame analysis using Amazon Bedrock Claude
Video Analysis: Comprehensive video segment analysis using TwelveLabs Pegasus
Real-time Transcription: Convert speech to text using OpenAI Whisper
Timed Metadata Publishing: Publish analysis results back to IVS as timed metadata
Rendition Selection: Automatic or manual selection of stream quality

Important

Using these demos with your AWS account will create and consume AWS resources, which will cost money.

Project Structure

amazon-ivs-python-demos/
├── README.md                                           # This file
├── requirements.txt                                    # Python dependencies
├── channels-subscribe/                                 # IVS Channel analysis tools
│   ├── README.md                                       # Channel tools documentation
│   ├── ivs-channel-subscribe-analyze-frames.py        # Frame analysis with Claude
│   ├── ivs-channel-subscribe-analyze-video.py         # Video analysis with Pegasus
│   ├── ivs-channel-subscribe-analyze-audio-video.py   # Combined audio/video analysis
│   ├── ivs-channel-subscribe-transcribe.py            # Real-time transcription
│   └── ivs_metadata_publisher.py                      # Timed metadata publisher
├── stages-publish/                                     # Real-Time Stages publishing
│   ├── ivs-stage-publish.py                           # Basic media publishing
│   ├── ivs-stage-publish-events.py                    # Publishing with event handling
│   └── ivs-stage-pub-sub.py                           # Simultaneous publish/subscribe
├── stages-subscribe/                                   # Real-Time Stages subscribing
│   ├── ivs-stage-subscribe-transcribe.py              # Subscribe with transcription
│   ├── ivs-stage-subscribe-analyze-frames.py          # Subscribe with AI frame analysis
│   └── ivs-stage-subscribe-analyze-video.py           # Subscribe with AI video analysis
├── stages-nova-s2s/                                    # AI Speech-to-Speech
│   └── ivs-stage-nova-s2s.py                          # Nova Sonic integration
├── stages-gpt-realtime/                                    # GPT RealTime API
│   └── ivs-stage-gpt-realtime.py                       # gpt-realtime integration
└── stages_sei/                                         # SEI Publishing System
    ├── SEI.md                                          # SEI documentation and usage guide
    ├── sei_publisher.py                                # High-level SEI message publishing
    └── h264_sei_patch.py                               # Low-level H.264 encoder patching

Prerequisites

Python 3.8 or higher
AWS CLI configured with appropriate credentials
Amazon IVS Real-Time Stage ARN and participant tokens
FFmpeg (for media processing when using transcription demo - not necessary otherwise)
Audio input/output devices (for speech-to-speech functionality)

AWS Permissions Required

Your AWS credentials need the following permissions:

For IVS Real-Time Stages:

ivs:CreateParticipantToken
bedrock:InvokeModel (for video frame analysis with Claude)
bedrock:InvokeModelWithBidirectionalStream (for Nova Sonic)
Access to Amazon IVS Real-Time Stages

For IVS Channels:

ivs:PutMetadata (for publishing timed metadata)
bedrock:InvokeModel (for Claude frame analysis and TwelveLabs Pegasus video analysis)
Access to Amazon IVS Channels

Installation

Clone and navigate to the project directory:
```
cd /amazon-ivs-aiortc-demos
```

Create and activate a virtual environment:

python3 -m venv .venv
source .venv/bin/activate  # On macOS/Linux
# or
.venv\Scripts\activate     # On Windows

Install dependencies:
```
pip install -r requirements.txt
```

Install system dependencies:

macOS:

brew install ffmpeg portaudio

Ubuntu/Debian:

sudo apt-get update
sudo apt-get install ffmpeg portaudio19-dev

Windows:

# Install FFmpeg
# Download from https://ffmpeg.org/download.html and add to PATH
# Or use chocolatey:
choco install ffmpeg

# PortAudio is typically installed automatically with pyaudio
# If you encounter issues, you may need to install Microsoft Visual C++ Build Tools

Configuration

Environment Variables

Set the following environment variables or ensure AWS CLI is configured:

export AWS_REGION=us-east-1
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key

# Optional: For weather functionality in Nova speech-to-speech
export WEATHER_API_KEY=your_weather_api_key

# Optional: For web search functionality in Nova speech-to-speech
export BRAVE_API_KEY=your_brave_api_key

Weather API (Optional)

The Nova speech-to-speech script supports weather queries through WeatherAPI.com:

Sign up at WeatherAPI.com for a free account
Get your API key from the dashboard
Set the WEATHER_API_KEY environment variable
The AI assistant will then be able to answer weather-related questions

Web Search API (Optional)

The Nova speech-to-speech script supports web search capabilities through Brave Search API:

Sign up at Brave Search API for a free account
Get your API key from the dashboard
Set the BRAVE_API_KEY environment variable or use the --brave-api-key command line argument
The AI assistant will then be able to search the web for current information, news, and facts

Sub-Projects

Channels Subscribe

The channels-subscribe/ directory contains scripts for subscribing to and analyzing Amazon IVS Channels (low-latency HLS streams).

Key Features

Frame Analysis: Analyze individual video frames using Amazon Bedrock Claude models
Video Analysis: Process video segments using TwelveLabs Pegasus for comprehensive content analysis
Audio/Video Analysis: Combined audio and video processing with proper synchronization using PyAV
Real-Time Transcription: Live speech-to-text using OpenAI Whisper with multi-language support
Timed Metadata Publishing: Publish analysis results back to IVS channels as timed metadata
Rendition Selection: Automatic or manual selection of stream quality

Scripts Overview

ivs-channel-subscribe-analyze-frames.py

Analyzes individual video frames at configurable intervals using Amazon Bedrock Claude
Supports multiple Claude models (Sonnet 4, Claude 3.5 Sonnet, Claude 3.5 Haiku)
Configurable analysis intervals for cost control
Optional video display and rendition quality selection

ivs-channel-subscribe-analyze-video.py

Records and analyzes video segments using TwelveLabs Pegasus
Encodes video chunks to MP4 for comprehensive analysis
OpenCV-based video capture with configurable recording duration

ivs-channel-subscribe-analyze-audio-video.py

Advanced script using PyAV for proper audio/video stream handling
Native audio capture and encoding with H.264 video and AAC audio
Complete media analysis with TwelveLabs Pegasus

ivs-channel-subscribe-transcribe.py

Real-time audio transcription using OpenAI Whisper
Support for 99+ languages with auto-detection
Multiple Whisper models from tiny to large-v3
Optional publishing of transcripts as IVS timed metadata

ivs_metadata_publisher.py

Reusable module for publishing timed metadata to IVS channels
Automatic channel ARN extraction from M3U8 playlist URLs
Rate limiting compliance and automatic payload splitting
Support for transcripts, events, and custom metadata

Usage Examples

# Frame analysis with Claude Sonnet 4
python channels-subscribe/ivs-channel-subscribe-analyze-frames.py \
  --playlist-url "https://example.com/playlist.m3u8" \
  --highest-quality

# Real-time transcription with metadata publishing
python channels-subscribe/ivs-channel-subscribe-transcribe.py \
  --playlist-url "https://example.com/playlist.m3u8" \
  --language en \
  --whisper-model base \
  --publish-transcript-as-timed-metadata

# Video analysis with TwelveLabs Pegasus
python channels-subscribe/ivs-channel-subscribe-analyze-video.py \
  --playlist-url "https://example.com/playlist.m3u8" \
  --analysis-duration 15 \
  --show-video

For detailed documentation, see channels-subscribe/README.md.

Stages Publish

The stages-publish/ directory contains scripts for publishing media content to IVS Real-Time Stages from MP4 files or live HLS streams.

ivs-stage-publish.py

Basic media publishing script that streams video/audio content to an IVS stage from MP4 files or HLS streams.

Features:

Publishes video and audio tracks from MP4 files or M3U8 HLS streams to IVS Real-Time Stages
JWT token validation and capability checking
WebRTC connection management
Option to publish video-only streams
Optional HLS stream health monitoring with automatic exit when stream ends
Configurable stream check intervals for cost-effective monitoring

Usage:

cd stages-publish

# Publish MP4 file
python ivs-stage-publish.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --path-to-mp4 "path/to/video.mp4"

# Publish HLS stream
python ivs-stage-publish.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --m3u8-url "https://example.com/stream.m3u8"

# Publish HLS stream with automatic exit when stream ends
python ivs-stage-publish.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --m3u8-url "https://example.com/stream.m3u8" \
  --stream-check-interval 30

Command-line Arguments:

--token: JWT participant token with publish capabilities (required)
--path-to-mp4: Path to MP4 file to publish (mutually exclusive with --m3u8-url)
--m3u8-url: M3U8 playlist URL for HLS stream to publish (mutually exclusive with --path-to-mp4)
--video-only: Publish video only, no audio (optional flag)
--stream-check-interval: Interval in seconds to check HLS stream health - enables automatic exit when stream ends (optional, HLS only)

HLS Stream Monitoring:

When using --stream-check-interval, the script monitors HLS stream health by periodically checking if the M3U8 playlist is still accessible:

Automatic Exit: Script gracefully exits when the HLS stream stops broadcasting
Rapid Verification: After a health check failure, the next 2 checks use a 1-second interval for quick verification
Consecutive Failures: Requires 3 consecutive failures before declaring the stream offline
Cost Control: Only makes HTTP requests when explicitly enabled with the parameter
No Interference: Stream monitoring doesn't affect video/audio quality or WebRTC performance

Stream Monitoring Behavior:

Normal check (30s) → ✅ Healthy → Wait 30s
Normal check (30s) → ❌ Failed → Wait 1s (rapid check 1/2)
Rapid check (2s)   → ❌ Failed → Wait 1s (rapid check 2/2)
Rapid check (2s)   → ❌ Failed → Stream declared offline, exit gracefully

Without --stream-check-interval: Script runs indefinitely until manually stopped (Ctrl+C), regardless of stream status.

ivs-stage-publish-events.py

Enhanced publishing script with real-time event handling via WebSocket connections.

Features:

All features of basic publisher
Real-time stage event monitoring via WebSocket
Participant join/leave notifications
Stage state change handling

Usage:

cd stages-publish
python ivs-stage-publish-events.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --path-to-mp4 "path/to/video.mp4"

Command-line Arguments:

--token: JWT participant token with publish capabilities (required)
--path-to-mp4: Path to MP4 file to publish (required)
--video-only: Publish video only, no audio (optional flag)

ivs-stage-pub-sub.py

Advanced script that demonstrates simultaneous publishing and subscribing capabilities.

Features:

Publishes audio from MP4 file while subscribing to other participants
Demonstrates bidirectional communication
Audio/video track management
SDP (Session Description Protocol) handling

Usage:

cd stages-publish
python ivs-stage-pub-sub.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --path-to-mp4 "path/to/audio.mp4"

Command-line Arguments:

--token: JWT participant token with both publish and subscribe capabilities (required)
--path-to-mp4: Path to MP4 file to publish audio from (required)
--video-only: Publish video only, no audio (optional flag)
--subscribe-to: List of participant IDs to subscribe to (optional)

Stages Subscribe

The stages-subscribe/ directory contains scripts for receiving and processing streams from IVS Real-Time Stages.

ivs-stage-subscribe-transcribe.py

Subscribes to IVS stage audio streams and provides real-time speech-to-text transcription using OpenAI Whisper with optional VTT file output.

Features:

Subscribes to audio tracks from specific participants in IVS Real-Time Stages
Real-time speech transcription using Whisper
Audio chunk processing and buffering
Multiple language support
Audio format conversion and normalization
Optional VTT (WebVTT) subtitle file output with proper timestamps
Real-time transcription file writing for live captioning

Usage:

cd stages-subscribe
python ivs-stage-subscribe-transcribe.py \
  --participant-id "participant123" \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..."

# With VTT output
python ivs-stage-subscribe-transcribe.py \
  --participant-id "participant123" \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --transcription-output-path "output.vtt" \
  --transcription-output-format "vtt"

Command-line Arguments:

--participant-id: ID of the participant to subscribe to (required)
--token: JWT participant token with subscribe capabilities (required)
--whisper-model: Whisper model size - "tiny", "base", "small", "medium", "large" (default: "tiny")
--fp16: Enable FP16 precision for faster processing (default: true)
--language: Language code for transcription (default: "en")
--chunk-duration: Audio chunk duration in seconds (default: 5)
--transcription-output-path: Path to save transcription output file (optional)
--transcription-output-format: Format for transcription output - currently supports "vtt" (optional)

Supported Languages:

English ("en")
Spanish ("es")
French ("fr")
German ("de")
Italian ("it")
Portuguese ("pt")
And many more supported by Whisper

ivs-stage-subscribe-analyze-frames.py

Subscribes to IVS stage video streams and provides AI-powered video frame analysis using Amazon Bedrock Claude models for content discovery, moderation, and accessibility.

Features:

Subscribes to video tracks from specific participants in IVS Real-Time Stages
AI-powered video frame analysis using Claude Sonnet 4
Configurable analysis intervals to control costs
Support for multiple Claude models (Sonnet 4, Claude 3.5 Sonnet, Claude 3.5 Haiku)
Detailed frame descriptions for content moderation and accessibility
Background processing to avoid blocking video streams
Cost-conscious design with smart frame sampling

Usage:

cd stages-subscribe
python ivs-stage-subscribe-analyze-frames.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --subscribe-to "participant123"

Command-line Arguments:

--token: JWT participant token with subscribe capabilities (required)
--subscribe-to: Participant ID to subscribe to (required)
--analysis-interval: Time in seconds between frame analyses (default: 30.0)
--bedrock-region: AWS region for Bedrock service (default: "us-east-1")
--bedrock-model-id: Bedrock model ID for analysis (default: "us.anthropic.claude-sonnet-4-20250514-v1:0")
--disable-analysis: Disable video frame analysis, just subscribe to video (optional flag)

Supported Models:

Claude Sonnet 4 (default): us.anthropic.claude-sonnet-4-20250514-v1:0 - Most capable, best for complex analysis
Claude 3.5 Sonnet: anthropic.claude-3-5-sonnet-20241022-v2:0 - Very capable, good balance of performance and cost
Claude 3.5 Haiku: anthropic.claude-3-5-haiku-20241022-v1:0 - Fastest and cheapest, good for basic content moderation

Use Cases:

Content Moderation: Automatically detect inappropriate content in live streams
Content Discovery: Generate descriptions and tags for video content
Accessibility: Create detailed descriptions for visually impaired users
Analytics: Track objects, activities, and engagement in video streams
Compliance: Monitor streams for regulatory compliance

Cost Control Features:

Configurable analysis intervals (default 30 seconds to minimize costs)
Background processing doesn't block video streaming
Option to disable analysis entirely for testing
Smart error handling prevents failed analyses from crashing streams

ivs-stage-subscribe-analyze-video.py

Subscribes to IVS stage audio and video streams and provides AI-powered video analysis using Amazon Bedrock TwelveLabs Pegasus for comprehensive video understanding.

Features:

Subscribes to both audio and video tracks from specific participants
Records short video clips (configurable duration) for analysis
Encodes audio and video to MP4 format in memory
AI-powered video analysis using TwelveLabs Pegasus model
Detailed video content descriptions including people, objects, activities, and text
Asynchronous processing to maintain stream performance
Configurable analysis duration and frequency

Usage:

cd stages-subscribe
python ivs-stage-subscribe-analyze-video.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --subscribe-to "participant123"

Command-line Arguments:

--token: JWT participant token with subscribe capabilities (required)
--subscribe-to: Participant ID to subscribe to (required)
--analysis-duration: Duration in seconds for video recording before analysis (default: 10.0)
--bedrock-region: AWS region for Bedrock service (default: "us-west-2")
--bedrock-model-id: Bedrock model ID for analysis (default: "us.twelvelabs.pegasus-1-2-v1:0")
--disable-analysis: Disable video analysis, just subscribe to video (optional flag)

Stages Nova Speech-to-Speech

The stages-nova-s2s/ directory contains the most advanced script integrating Amazon Nova Sonic for AI-powered speech-to-speech functionality.

ivs-stage-nova-s2s.py

A comprehensive script that combines IVS Real-Time Stages with Amazon Nova Sonic for conversational AI experiences.

Features:

Bidirectional audio streaming with IVS participants
Amazon Nova Sonic integration for AI responses
Real-time waveform visualization
Audio resampling and format conversion
WebRTC track management for both publishing and subscribing
Dynamic audio visualization with gradient colormaps
AI-powered video frame analysis using Amazon Bedrock Claude models
Built-in tools for date/time, weather, and visual analysis
Configurable frame analysis with multiple Claude model options

Usage:

cd stages-nova-s2s
python ivs-stage-nova-s2s.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --subscribe-to "participant123"

Command-line Arguments:

--token: JWT participant token with both publish and subscribe capabilities (required)
--subscribe-to: Participant ID to subscribe to (required)
--nova-model-id: Amazon Nova model identifier (default: "amazon.nova-sonic-v1:0")
--nova-region: AWS region for Nova service (default: "us-east-1")
--disable-frame-analysis: Disable video frame analysis (default: enabled)
--bedrock-model-id: Bedrock model ID for frame analysis (default: "us.anthropic.claude-sonnet-4-20250514-v1:0")
--bedrock-region: AWS region for Bedrock service (default: "us-east-1")
--weather-api-key: Weather API key for weather tool functionality (overrides WEATHER_API_KEY environment variable)
--brave-api-key: Brave Search API key for web search tool functionality (overrides BRAVE_API_KEY environment variable)
--ice-timeout: ICE gathering timeout in seconds (default: 1, original: 5) - Lower values speed up connection establishment

Key Components:

AgentAudioTrack: Custom audio track for streaming Nova responses
AgentVideoTrack: Dynamic waveform visualization with thinking states
BedrockStreamManager: Manages bidirectional Nova Sonic streaming
Audio Processing: Handles resampling between IVS (48kHz) and Nova (16kHz)
Tool Support: Built-in tools for date/time, weather, and video frame analysis
Frame Analysis: Non-blocking AI-powered video frame analysis using Claude models

Available Tools:

Date/Time Tool: Get current date and time information for specific locations with timezone support
Weather Tool: Get current weather and 5-day forecast (requires WEATHER_API_KEY)
Web Search Tool: Search the web for current information, news, and facts (requires BRAVE_API_KEY)
Frame Analysis Tool: Analyze video frames for visual assistance and content description

Assistant Management

For automated management of multiple Nova assistant instances via WebSocket integration with IVS Chat, see:

IVS Stage Assistant Manager Documentation

This companion tool allows you to dynamically launch and manage multiple Nova S2S instances based on chat messages, perfect for scaling AI assistants across multiple participants.

OpenAI Assistant Management

For automated management of multiple OpenAI assistant instances via WebSocket integration with IVS Chat, see:

IVS Stage OpenAI Assistant Manager Documentation

This companion tool allows you to dynamically launch and manage multiple OpenAI real-time instances based on chat messages, with full control over voice, VAD settings, and vision capabilities.

Stages OpenAI Real-time API

The stages-gpt-realtime/ directory contains integration with OpenAI's real-time API for speech-to-speech conversations with IVS Real-Time Stages.

ivs-stage-gpt-realtime.py

A comprehensive script that integrates OpenAI's gpt-realtime API with IVS Real-Time Stages for conversational AI experiences.

Features:

Bidirectional audio streaming with IVS participants
OpenAI real-time API integration for AI responses
WebSocket-based real-time communication with OpenAI
Real-time audio visualization
Audio resampling and format conversion (24kHz for OpenAI)
WebRTC track management for both publishing and subscribing
Voice activity detection and interruption handling
Multiple voice options (alloy, ash, ballad, coral, echo, sage, shimmer, verse, marin, cedar)

Usage:

cd stages-gpt-realtime
python ivs-stage-openai-realtime.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --subscribe-to "participant123" \
  --openai-key "sk-..."

Command-line Arguments:

--token: JWT participant token with both publish and subscribe capabilities (required)
--subscribe-to: Participant ID to subscribe to (required)
--openai-key: OpenAI API key (optional, uses OPENAI_API_KEY environment variable if not provided)
--model: OpenAI model to use (default: "gpt-realtime")
--voice: Voice to use for responses - "alloy", "ash", "ballad", "coral", "echo", "sage", "shimmer", "verse", "marin", "cedar" (default: "cedar")
--disable-frame-analysis: Disable video frame analysis (default: enabled)
--bedrock-region: AWS region for Bedrock service (default: "us-east-1")
--bedrock-model-id: Bedrock model ID for frame analysis (default: "us.anthropic.claude-sonnet-4-20250514-v1:0")
--ice-timeout: ICE gathering timeout in seconds (default: 1, original: 5)

Key Components:

OpenAIAudioTrack: Custom audio track for streaming OpenAI responses
OpenAIVideoTrack: Dynamic audio visualization with OpenAI branding
OpenAIRealtimeManager: Manages bidirectional OpenAI real-time API streaming
Audio Processing: Handles resampling for OpenAI's 24kHz requirement
WebSocket Management: Handles OpenAI real-time API WebSocket connection
Vision Capabilities: AI-powered video frame analysis using Amazon Bedrock Claude models

Available Voices:

cedar (default): Warm and conversational
alloy: Balanced and natural
ash: Clear and articulate
ballad: Smooth and melodic
coral: Bright and engaging
echo: Clear and articulate
sage: Wise and thoughtful
shimmer: Soft and gentle
verse: Expressive and dynamic
marin: Professional and polished

Prerequisites:

OpenAI API key with real-time API access
IVS stage token with both publish and subscribe capabilities
Python 3.8+ with required dependencies

Known Limitations:

Semantic VAD Transcriptions: When using --vad-mode semantic_vad, user input transcriptions may not be generated. Use --vad-mode server_vad if you need reliable user transcriptions for SEI metadata or logging.

Environment Variables:

export OPENAI_API_KEY="sk-your-openai-api-key-here"

Example Usage:

# Basic OpenAI conversation
python stages-gpt-realtime/ivs-stage-openai-realtime.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --subscribe-to "participant123"

# Using different voice and model
python stages-gpt-realtime/ivs-stage-openai-realtime.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --subscribe-to "participant123" \
  --voice "nova" \
  --model "gpt-4o-realtime-preview-2024-10-01"

# With explicit API key
python stages-gpt-realtime/ivs-stage-openai-realtime.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --subscribe-to "participant123" \
  --openai-key "sk-..."

Stages SEI Publishing

The stages_sei/ directory contains a comprehensive SEI (Supplemental Enhancement Information) publishing system for embedding metadata directly into H.264 video streams.

What is SEI?

SEI NAL units are part of the H.264/AVC video compression standard that allow embedding additional metadata within the video stream itself. This metadata travels with the video frames, ensuring perfect synchronization between video content and associated data.

Key Features:

Perfect Synchronization: Metadata is embedded directly in video frames
Low Latency: No separate data channels needed
Standards Compliant: Uses official H.264 specification
Multi-format Support: Handles Annex B, AVCC, and RTP H.264 formats
Automatic Integration: Patches aiortc and PyAV encoders automatically
Reliable Delivery: 3x repetition with client-side deduplication

Components:

sei_publisher.py: High-level interface for publishing SEI messages
h264_sei_patch.py: Low-level H.264 encoder patching system
SEI.md: Comprehensive documentation and usage guide

Usage Example:

from stages_sei import SeiPublisher, patch_h264_encoder, set_global_sei_publisher

# Apply H.264 encoder patch (do this early in your application)
patch_h264_encoder()

# Create and configure SEI publisher
sei_publisher = SeiPublisher()
set_global_sei_publisher(sei_publisher)

# Publish metadata
await sei_publisher.publish_json({
    "type": "chat_message",
    "user": "alice",
    "message": "Hello world!",
    "timestamp": time.time()
})

Integration:

The Nova speech-to-speech script (stages-nova-s2s/ivs-stage-nova-s2s.py) demonstrates SEI publishing in action, embedding AI assistant responses directly into the video stream for synchronized delivery.

For detailed documentation, see stages_sei/SEI.md.

Utility Scripts

Note: Utility scripts are excluded from this documentation as they are development/testing tools.

Usage Examples

IVS Channel Examples

# Subscribe to IVS channel and analyze frames with Claude
python channels-subscribe/ivs-channel-subscribe-analyze-frames.py \
  --playlist-url "https://example.com/playlist.m3u8" \
  --highest-quality \
  --analysis-interval 30

# Real-time transcription of IVS channel audio
python channels-subscribe/ivs-channel-subscribe-transcribe.py \
  --playlist-url "https://example.com/playlist.m3u8" \
  --language en \
  --whisper-model base \
  --publish-transcript-as-timed-metadata

# Comprehensive video analysis with TwelveLabs Pegasus
python channels-subscribe/ivs-channel-subscribe-analyze-video.py \
  --playlist-url "https://example.com/playlist.m3u8" \
  --analysis-duration 10 \
  --bedrock-region us-west-2

# Combined audio/video analysis using PyAV
python channels-subscribe/ivs-channel-subscribe-analyze-audio-video.py \
  --playlist-url "https://example.com/playlist.m3u8" \
  --highest-quality \
  --analysis-duration 15

IVS Real-Time Stages Examples

Basic Publishing Examples

# Publish MP4 file to IVS stage
python stages-publish/ivs-stage-publish.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --path-to-mp4 "sample-video.mp4"

# Publish HLS stream to IVS stage
python stages-publish/ivs-stage-publish.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --m3u8-url "https://example.com/live/stream.m3u8"

# Publish HLS stream with automatic exit when stream ends (check every 30 seconds)
python stages-publish/ivs-stage-publish.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --m3u8-url "https://example.com/live/stream.m3u8" \
  --stream-check-interval 30

# Publish HLS stream with frequent monitoring (check every 10 seconds)
python stages-publish/ivs-stage-publish.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM8NCJ9..." \
  --m3u8-url "https://example.com/live/stream.m3u8" \
  --stream-check-interval 10

# Publish video-only HLS stream
python stages-publish/ivs-stage-publish.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --m3u8-url "https://example.com/live/stream.m3u8" \
  --video-only

Publishing with Events Example

# Publish with real-time event monitoring
python stages-publish/ivs-stage-publish-events.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --path-to-mp4 "sample-video.mp4"

Transcription Examples

# Basic transcription (console output only)
python stages-subscribe/ivs-stage-subscribe-transcribe.py \
  --participant-id "participant123" \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..."

# Subscribe and transcribe audio in Spanish
python stages-subscribe/ivs-stage-subscribe-transcribe.py \
  --participant-id "participant123" \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --language "es" \
  --whisper-model "medium"

# Save transcription to VTT file for live captioning
python stages-subscribe/ivs-stage-subscribe-transcribe.py \
  --participant-id "participant123" \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --transcription-output-path "live_captions.vtt" \
  --transcription-output-format "vtt"

# High-quality transcription with VTT output
python stages-subscribe/ivs-stage-subscribe-transcribe.py \
  --participant-id "participant123" \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --whisper-model "large-v3" \
  --language "en" \
  --chunk-duration "10" \
  --transcription-output-path "meeting_transcript.vtt" \
  --transcription-output-format "vtt"

Video Frame Analysis Examples

# Basic video frame analysis (every 30 seconds with Claude Sonnet 4)
python stages-subscribe/ivs-stage-subscribe-analyze-frames.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --subscribe-to "participant123"

# Frequent analysis for real-time moderation (every 5 seconds)
python stages-subscribe/ivs-stage-subscribe-analyze-frames.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --subscribe-to "participant123" \
  --analysis-interval 5.0

# Cost-effective analysis using Claude 3.5 Haiku (every 60 seconds)
python stages-subscribe/ivs-stage-subscribe-analyze-frames.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --subscribe-to "participant123" \
  --bedrock-model-id "anthropic.claude-3-5-haiku-20241022-v1:0" \
  --analysis-interval 60.0

# Analysis in different AWS region
python stages-subscribe/ivs-stage-subscribe-analyze-frames.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --subscribe-to "participant123" \
  --bedrock-region "eu-west-1"

# Subscribe to video without analysis (testing connectivity)
python stages-subscribe/ivs-stage-subscribe-analyze-frames.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --subscribe-to "participant123" \
  --disable-analysis

Video Analysis Examples

# Basic video analysis with TwelveLabs Pegasus
python stages-subscribe/ivs-stage-subscribe-analyze-video.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --subscribe-to "participant123"

# Shorter video clips for more frequent analysis
python stages-subscribe/ivs-stage-subscribe-analyze-video.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --subscribe-to "participant123" \
  --analysis-duration 5.0

AI Speech-to-Speech Example

# Start Nova Sonic conversation with frame analysis
python stages-nova-s2s/ivs-stage-nova-s2s.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --subscribe-to "participant123" \
  --nova-model-id "amazon.nova-sonic-v1:0" \
  --nova-region "us-east-1"

# Nova conversation without frame analysis
python stages-nova-s2s/ivs-stage-nova-s2s.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --subscribe-to "participant123" \
  --disable-frame-analysis

# Nova conversation with custom Bedrock model and region
python stages-nova-s2s/ivs-stage-nova-s2s.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --subscribe-to "participant123" \
  --bedrock-model-id "anthropic.claude-3-5-sonnet-20241022-v2:0" \
  --bedrock-region "us-west-2"

# Nova conversation with fast connection setup
python stages-nova-s2s/ivs-stage-nova-s2s.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --subscribe-to "participant123" \
  --ice-timeout 1

OpenAI Real-time API Examples

# Basic OpenAI real-time conversation
python stages-gpt-realtimeltime/ivs-stage-openai-realtime.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --subscribe-to "participant123"

# Using different voice
python stages-gpt-realtime/ivs-stage-openai-realtime.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --subscribe-to "participant123" \
  --voice "nova"

# With explicit OpenAI API key
python stages-gpt-realtime/ivs-stage-openai-realtime.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --subscribe-to "participant123" \
  --openai-key "sk-your-key-here"

# With vision capabilities disabled
python stages-gpt-realtime/ivs-stage-openai-realtime.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --subscribe-to "participant123" \
  --disable-frame-analysis

# With custom Bedrock model for vision
python stages-gpt-realtimeltime/ivs-stage-openai-realtime.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --subscribe-to "participant123" \
  --bedrock-model-id "anthropic.claude-3-5-sonnet-20241022-v2:0"

# Fast connection setup
python stages-gpt-realtime/ivs-stage-openai-realtime.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --subscribe-to "participant123" \
  --ice-timeout 1

Publish and Subscribe Example

# Simultaneously publish and subscribe
python stages-publish/ivs-stage-pub-sub.py \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --path-to-mp4 "audio-file.mp4" \
  --subscribe-to "participant1" "participant2"

Creating Participant Tokens

Use the AWS CLI to create participant tokens:

# Create a token with publish capabilities
aws ivs-realtime create-participant-token \
  --stage-arn "arn:aws:ivs:us-east-1:123456789012:stage/abcdefgh" \
  --user-id "user123" \
  --capabilities PUBLISH \
  --duration 720

# Create a token with subscribe capabilities
aws ivs-realtime create-participant-token \
  --stage-arn "arn:aws:ivs:us-east-1:123456789012:stage/abcdefgh" \
  --user-id "user456" \
  --capabilities SUBSCRIBE \
  --duration 720

# Create a token with both publish and subscribe capabilities
aws ivs-realtime create-participant-token \
  --stage-arn "arn:aws:ivs:us-east-1:123456789012:stage/abcdefgh" \
  --user-id "user789" \
  --capabilities PUBLISH SUBSCRIBE \
  --duration 720

Troubleshooting

Common Issues

IVS Channels Issues

"No audio stream found"
- Check if the M3U8 stream contains audio using ffprobe
- Try different rendition quality options
- Verify stream accessibility with curl
"Unable to open video stream"
- Verify M3U8 URL is accessible
- Check network connectivity and firewall settings
- Try different rendition selections
Whisper Model Issues
- Clear Whisper cache: rm -rf ~/.cache/whisper/
- Use smaller models for memory-constrained environments
- Enable FP16 for faster processing
Timed Metadata Publishing Issues
- Verify AWS credentials have ivs:PutMetadata permissions
- Check rate limiting (5 RPS per channel, 155 RPS per account)
- Ensure channel ARN extraction is working correctly

IVS Real-Time Stages Issues

Audio Quality Problems
- Ensure consistent chunk sizes (512 samples recommended)
- Check audio resampling configuration
- Verify WebRTC connection stability
WebRTC Connection Failures
- Verify JWT token has correct capabilities
- Check network connectivity and firewall settings
- Ensure SDP munging is applied correctly
Nova Sonic Issues
- Verify AWS credentials have Bedrock permissions
- Check model availability in your region
- Ensure proper event sequence (START_SESSION → START_PROMPT → content)

General Issues

Video Frame Analysis Issues
- Verify AWS credentials have bedrock:InvokeModel permissions
- Check Claude/Pegasus model availability in your region
- Monitor analysis costs with appropriate intervals
- Ensure video track is receiving frames before analysis begins
Transcription Accuracy
- Use appropriate Whisper model size for your use case
- Ensure clean audio input
- Consider language-specific models

Debug Mode

Enable debug logging for detailed troubleshooting:

export PYTHONPATH=$PYTHONPATH:.
python -c "import logging; logging.basicConfig(level=logging.DEBUG)"
python your-script.py --your-args

Performance Optimization

IVS Channels Optimization

For Channel Transcription:
- Use --whisper-model tiny or --whisper-model base for real-time processing
- Enable FP16: --fp16 true
- Use shorter chunks: --chunk-duration 3
- Specify language: --language en (faster than auto-detect)
For Channel Video Analysis:
- Use --lowest-quality for faster processing
- Adjust --analysis-duration based on content complexity
- Run without --show-video for headless operation
For Channel Frame Analysis:
- Increase --analysis-interval for less frequent analysis (cost control)
- Use --lowest-quality for faster frame processing
- Choose appropriate Claude model for your use case

IVS Real-Time Stages Optimization

Connection Speed:
- Use --ice-timeout 1 for faster WebRTC connection establishment (default)
- Original WebRTC ICE timeout is 5 seconds, optimized to 1 second for better user experience
- Increase timeout if experiencing connection issues in poor network conditions
- This optimization reduces startup time from ~11 seconds to ~3 seconds
For Nova Sonic:
- Use consistent 1ms delays between audio chunks
- Implement proper buffering strategies
- Monitor memory usage during long sessions
For Stage Transcription:
- Choose appropriate chunk duration (5-10 seconds)
- Use smaller Whisper models for real-time processing
- Consider GPU acceleration for large models
- Use VTT output for live captioning applications
- Specify language explicitly for better accuracy and performance

General Optimization

For Video Frame Analysis:
- Use longer analysis intervals (30+ seconds) to control costs
- Choose appropriate Claude model for your use case:
  - Claude 3.5 Haiku for basic content moderation
  - Claude 3.5 Sonnet for balanced performance
  - Claude Sonnet 4 for complex analysis requiring highest accuracy
- Monitor Bedrock usage and costs in AWS console
- Consider regional model availability and latency

Dependencies

Core Dependencies

aiortc>=1.12.0 - WebRTC implementation
av>=10.0.0 - Media processing
requests>=2.28.0 - HTTP client
websockets>=11.0.0 - WebSocket client
numpy>=1.21.0 - Numerical computing

AI/ML Dependencies

whisper (from GitHub) - Speech recognition
boto3>=1.34.0 - AWS SDK for Bedrock and IVS
aws-sdk-bedrock-runtime - Amazon Bedrock client
smithy-aws-core>=0.0.1 - AWS SDK core
pyaudio>=0.2.13 - Audio I/O
rx>=3.2.0 - Reactive extensions
Pillow>=10.0.0 - Image processing for video frame analysis
opencv-python>=4.8.0 - Computer vision for video processing

Utility Dependencies

pytz - Timezone handling
tzlocal - Local timezone detection

System Requirements

Python 3.8+
FFmpeg
PortAudio (for audio I/O)
Sufficient bandwidth for WebRTC streams

Contributing

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Support

For issues related to:

Amazon IVS Real-Time Stages: Check the IVS Real-Time Streaming documentation
Amazon IVS Channels: Check the IVS Low-Latency Streaming documentation
Amazon Nova: Check the Bedrock documentation
Amazon Bedrock: Check the Bedrock User Guide
aiortc: Check the aiortc documentation
OpenAI Whisper: Check the Whisper repository

This project demonstrates advanced integration patterns between Amazon IVS services and AI capabilities. From real-time conversational AI with Nova Sonic to comprehensive video analysis with Claude and TwelveLabs Pegasus, these demos showcase the power of combining live video streaming with cutting-edge AI services.

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
channels-subscribe		channels-subscribe
stages-gpt-realtime		stages-gpt-realtime
stages-nova-s2s		stages-nova-s2s
stages-publish		stages-publish
stages-subscribe		stages-subscribe
stages_sei		stages_sei
.gitallowed		.gitallowed
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

aws-samples/sample-amazon-ivs-python-demos

Folders and files

Latest commit

History

Repository files navigation

Amazon IVS Python Demo Scripts

Table of Contents

Overview

IVS Real-Time Stages (WebRTC)

IVS Channels (Low-Latency HLS)

Project Structure

Prerequisites

AWS Permissions Required

Installation

Configuration

Environment Variables

Weather API (Optional)

Web Search API (Optional)

Sub-Projects

Channels Subscribe

Key Features

Scripts Overview

Usage Examples

Stages Publish

ivs-stage-publish.py

ivs-stage-publish-events.py

ivs-stage-pub-sub.py

Stages Subscribe

ivs-stage-subscribe-transcribe.py

ivs-stage-subscribe-analyze-frames.py

ivs-stage-subscribe-analyze-video.py

Stages Nova Speech-to-Speech

ivs-stage-nova-s2s.py

Assistant Management

OpenAI Assistant Management

Stages OpenAI Real-time API

ivs-stage-gpt-realtime.py

Stages SEI Publishing

Utility Scripts

Usage Examples

IVS Channel Examples

IVS Real-Time Stages Examples

Basic Publishing Examples

Publishing with Events Example

Transcription Examples

Video Frame Analysis Examples

Video Analysis Examples

AI Speech-to-Speech Example

OpenAI Real-time API Examples

Publish and Subscribe Example

Creating Participant Tokens

Troubleshooting

Common Issues

IVS Channels Issues

IVS Real-Time Stages Issues

General Issues

Debug Mode

Performance Optimization

IVS Channels Optimization

IVS Real-Time Stages Optimization

General Optimization

Dependencies

Core Dependencies

AI/ML Dependencies

Utility Dependencies

System Requirements

Contributing

License

Support

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Packages