Event-based video intelligence with 98% cost reduction
Multi-source video processing SDK with intelligent frame selection, motion tracking, and VLM-powered analysis. Built for production use with RTSP, ONVIF, UDP, WebRTC, and more coming soon.
Note:
pip install vlm-sdkinstalls the SDK components (connectors, preprocessors, providers). The FastAPI service depends on additional packages; install them separately if you plan to run the API.
- π― Event-based processing: Only analyze frames with motion/activity (98% cost reduction vs frame-by-frame)
- πΉ Multi-source connectors: RTSP, ONVIF, UDP, WebRTC, File
- π€ RT-DETR + ByteTrack: Real-time object detection and motion tracking
- π§ Provider-agnostic VLM: Gemini, Qwen, ObserveeVLM (Small VLM coming soon) (via env config)
- π¨ Advanced analysis: Timestamps, object detection, bounding boxes, range queries
- β‘ FastAPI REST API: Industry-standard multi-stream video intelligence
- π‘ Server-Sent Events (SSE): Real-time event streaming
- π Authentication: API key-based auth with rate limiting
- π Monitoring: Health checks, metrics, stream management
- π§ Configurable: Environment-based provider selection
# Install from PyPI
pip install vlm-sdk
# Or install from source
git clone https://github.com/observee-ai/vlm-sdk.git
cd vlm-sdk
pip install -e .from vlm.preprocessors import DetectorPreprocessor
from vlm.connectors import RTSPConnector
from vlm.providers.gemini import GeminiVideoService
import asyncio
# Initialize components
connector = RTSPConnector("rtsp://camera.local/stream1")
preprocessor = DetectorPreprocessor({
"confidence_threshold": 0.6,
"interesting_objects": ["person", "car"],
"min_event_duration": 2.0, # Only events longer than 2 seconds
})
gemini = GeminiVideoService(api_key="your-gemini-key")
# Process stream
async def process():
for frame in connector.stream_frames():
result = preprocessor.process_frame(frame.data, frame.timestamp)
if result['status'] == 'completed':
# Event detected! Analyze with VLM
upload = await gemini.upload_file(result['clip_path'])
analysis = await gemini.query_video_with_file(
upload['name'],
"Describe the activity in this video"
)
print(f"Analysis: {analysis['response']}")
asyncio.run(process())
DetectorPreprocessoraccepts a configuration dictionary (matching the keys documented invlm/preprocessors/detector/core.py). Useinteresting_objectsto control tracked classes andmin_event_durationfor event length thresholds. Configuration keys such asconfidence_threshold,interesting_objects, andmin_event_durationmust be provided via the config dict (not as individual keyword arguments).
# Set environment variables
export ADMIN_API_KEY=your-secret-key
export GEMINI_API_KEY=your-gemini-key
export VLM_PROVIDER=gemini # or openai, anthropic
# Install SDK (from repo checkout)
pip install -e .
# Install API dependencies (required for running api.main)
pip install fastapi uvicorn[standard] pydantic python-dotenv
# or install everything we ship in Docker
pip install -r requirements.txt
# Run server
python -m api.main
# Server starts at http://localhost:8000Note: To accept WebRTC publishers, run MediaMTX alongside the API using the provided
mediamtx.yml(see docs/apiguide.md for commands).
# Pull the public image (linux/amd64)
docker pull observee/vlm-sdk:latest
# Run the API (set your API keys as needed)
docker run --rm -p 8000:8000 \
-e ADMIN_API_KEY=your-secret-key \
-e GEMINI_API_KEY=your-gemini-key \
observee/vlm-sdk:latestCreate a stream:
curl -X POST http://localhost:8000/v1/streams/create \
-H "X-Admin-API-Key: your-secret-key" \
-H "X-VLM-API-Key: your-gemini-key" \
-H "Content-Type: application/json" \
-d '{
"source_type": "rtsp",
"source_url": "rtsp://camera.local/stream1",
"config": {
"username": "admin",
"password": "password",
"profile": "security",
"min_duration": 2.0
},
"analysis": {
"enabled": true,
"mode": "basic",
"prompt": "Describe any activity or movement"
}
}'Listen to events (SSE):
curl -N http://localhost:8000/v1/streams/{stream_id}/events \
-H "X-Admin-API-Key: your-secret-key"# Required
ADMIN_API_KEY=your-admin-key # API authentication
# VLM Provider (choose one)
VLM_PROVIDER=gemini # gemini, openai, or anthropic
GEMINI_API_KEY=your-gemini-key # If using Gemini
OPENAI_API_KEY=your-openai-key # If using OpenAI
ANTHROPIC_API_KEY=your-anthropic-key # If using Claude
# Optional: Rate Limiting
RATE_LIMIT_REQUESTS=100 # Requests per window
RATE_LIMIT_WINDOW=60 # Time window (seconds)Basic - Simple video description
{
"analysis": {
"mode": "basic",
"prompt": "Describe the activity"
}
}Timestamps - Find specific moments
{
"analysis": {
"mode": "timestamps",
"find_timestamps": {
"query": "when does someone wave",
"find_all": true,
"confidence_threshold": 0.7
}
}
}| Connector | Description | Config |
|---|---|---|
| RTSP | IP camera streams | username, password, transport (tcp/udp) |
| ONVIF | Auto-discovery + PTZ | username, password, profile_index |
| UDP | UDP video receiver | host, port, buffer_size |
| WebRTC | Browser streams | signaling_url, ice_servers |
POST /v1/streams/create Create stream
GET /v1/streams/{id}/events SSE event stream
GET /v1/streams/{id} Get status
DELETE /v1/streams/{id} Stop stream
GET /v1/streams List all streams
GET /v1/streams/discover/onvif Discover cameras
GET /v1/streams/health Health check
βββββββββββββββ
β Connector β (RTSP/ONVIF/UDP/WebRTC)
ββββββββ¬βββββββ
β Frames
βΌ
βββββββββββββββ
β RT-DETR β (Object detection + motion tracking)
ββββββββ¬βββββββ
β Events (only motion/activity)
βΌ
βββββββββββββββ
β Event Bufferβ (Collects frames during events)
ββββββββ¬βββββββ
β Complete Events
ββββββββββββββββββ
β β
βΌ βΌ
βββββββββββββ ββββββββββββ
β Storage β β VLM β (Gemini/Qwen/ObserveeVLM)
βββββββββββββ ββββββ¬ββββββ
β
βΌ
βββββββββββββββββ
β SSE / Webhooksβ
βββββββββββββββββ
Key Innovation: Event-based processing analyzes only frames with detected motion/activity, reducing VLM API calls by 98% compared to frame-by-frame analysis.
vlm-sdk/
βββ vlm/ # Core SDK components
βββ api/ # FastAPI service (routers, services, models)
βββ examples/ # Sample scripts for RTSP/UDP/WebRTC usage
βββ docs/ # Additional documentation
βββ mediamtx/ # MediaMTX config for WebRTC/RTSP bridging
βββ output/ # Example generated clips (safe to remove)
βββ pyproject.toml # SDK packaging metadata
βββ requirements.txt # Full dependency list for API/Docker
βββ Dockerfile # Reference container for the API
βββ README.md
# Clone repository
git clone https://github.com/observee-ai/vlm-sdk.git
cd vlm-sdk
# Install with dev dependencies
pip install -e ".[dev]"
# Include API stack if you plan to run the server locally
pip install -r requirements.txt
# Run tests
pytest tests/
# Format code
black vlm/ api/
ruff check vlm/ api/
# Run API server (development)
uvicorn api.main:app --reload- π’ Security & Surveillance: 24/7 perimeter monitoring with motion alerts
- πͺ Retail Analytics: Customer counting, queue analysis, behavior tracking
- π Traffic Monitoring: Vehicle counting, flow analysis, incident detection
- π Smart Home: Activity monitoring, intrusion detection
- π Industrial: Safety compliance, equipment monitoring
| Approach | Frames/Hour | VLM API Calls | Cost Reduction |
|---|---|---|---|
| Frame-by-frame | 54,000 (15 FPS) | 54,000 | Baseline |
| Event-based (VLMS) | 54,000 | ~1,000 | 98% β |
Example: 1-hour 15 FPS stream with 5-10 motion events
Contributions welcome! Please see CONTRIBUTING.md for guidelines.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Apache-2.0 β Permissive license suitable for commercial and open-source use.
See LICENSE for the complete text. Commercial support is available on request.
- Ultralytics RT-DETR: Object detection and tracking
- FastAPI: Modern Python web framework
- Google Gemini: Video understanding API
- Qwen API: Alternative Video Understanding API
- ByteTrack: Multi-object tracking algorithm
Built with β€οΈ for efficient video intelligence in SF