Skip to content

observee-ai/vlm-sdk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

VLMS - Video Intelligence SDK

Event-based video intelligence with 98% cost reduction

Multi-source video processing SDK with intelligent frame selection, motion tracking, and VLM-powered analysis. Built for production use with RTSP, ONVIF, UDP, WebRTC, and more coming soon.

Note: pip install vlm-sdk installs the SDK components (connectors, preprocessors, providers). The FastAPI service depends on additional packages; install them separately if you plan to run the API.

Python 3.10+ License: Apache-2.0


🌟 Features

Core SDK (vlm)

  • 🎯 Event-based processing: Only analyze frames with motion/activity (98% cost reduction vs frame-by-frame)
  • πŸ“Ή Multi-source connectors: RTSP, ONVIF, UDP, WebRTC, File
  • πŸ€– RT-DETR + ByteTrack: Real-time object detection and motion tracking
  • 🧠 Provider-agnostic VLM: Gemini, Qwen, ObserveeVLM (Small VLM coming soon) (via env config)
  • 🎨 Advanced analysis: Timestamps, object detection, bounding boxes, range queries

Production API (api)

  • ⚑ FastAPI REST API: Industry-standard multi-stream video intelligence
  • πŸ“‘ Server-Sent Events (SSE): Real-time event streaming
  • πŸ” Authentication: API key-based auth with rate limiting
  • πŸ“Š Monitoring: Health checks, metrics, stream management
  • πŸ”§ Configurable: Environment-based provider selection

πŸš€ Quick Start

Installation

# Install from PyPI
pip install vlm-sdk

# Or install from source
git clone https://github.com/observee-ai/vlm-sdk.git
cd vlm-sdk
pip install -e .

SDK Usage

from vlm.preprocessors import DetectorPreprocessor
from vlm.connectors import RTSPConnector
from vlm.providers.gemini import GeminiVideoService
import asyncio

# Initialize components
connector = RTSPConnector("rtsp://camera.local/stream1")
preprocessor = DetectorPreprocessor({
    "confidence_threshold": 0.6,
    "interesting_objects": ["person", "car"],
    "min_event_duration": 2.0,  # Only events longer than 2 seconds
})

gemini = GeminiVideoService(api_key="your-gemini-key")

# Process stream
async def process():
    for frame in connector.stream_frames():
        result = preprocessor.process_frame(frame.data, frame.timestamp)

        if result['status'] == 'completed':
            # Event detected! Analyze with VLM
            upload = await gemini.upload_file(result['clip_path'])
            analysis = await gemini.query_video_with_file(
                upload['name'],
                "Describe the activity in this video"
            )
            print(f"Analysis: {analysis['response']}")

asyncio.run(process())

DetectorPreprocessor accepts a configuration dictionary (matching the keys documented in vlm/preprocessors/detector/core.py). Use interesting_objects to control tracked classes and min_event_duration for event length thresholds. Configuration keys such as confidence_threshold, interesting_objects, and min_event_duration must be provided via the config dict (not as individual keyword arguments).

API Server

# Set environment variables
export ADMIN_API_KEY=your-secret-key
export GEMINI_API_KEY=your-gemini-key
export VLM_PROVIDER=gemini  # or openai, anthropic

# Install SDK (from repo checkout)
pip install -e .

# Install API dependencies (required for running api.main)
pip install fastapi uvicorn[standard] pydantic python-dotenv
# or install everything we ship in Docker
pip install -r requirements.txt

# Run server
python -m api.main

# Server starts at http://localhost:8000

Note: To accept WebRTC publishers, run MediaMTX alongside the API using the provided mediamtx.yml (see docs/apiguide.md for commands).

Docker Image

# Pull the public image (linux/amd64)
docker pull observee/vlm-sdk:latest

# Run the API (set your API keys as needed)
docker run --rm -p 8000:8000 \
  -e ADMIN_API_KEY=your-secret-key \
  -e GEMINI_API_KEY=your-gemini-key \
  observee/vlm-sdk:latest

Create a stream:

curl -X POST http://localhost:8000/v1/streams/create \
  -H "X-Admin-API-Key: your-secret-key" \
  -H "X-VLM-API-Key: your-gemini-key" \
  -H "Content-Type: application/json" \
  -d '{
    "source_type": "rtsp",
    "source_url": "rtsp://camera.local/stream1",
    "config": {
      "username": "admin",
      "password": "password",
      "profile": "security",
      "min_duration": 2.0
    },
    "analysis": {
      "enabled": true,
      "mode": "basic",
      "prompt": "Describe any activity or movement"
    }
  }'

Listen to events (SSE):

curl -N http://localhost:8000/v1/streams/{stream_id}/events \
  -H "X-Admin-API-Key: your-secret-key"

πŸ“– Documentation

Environment Variables

# Required
ADMIN_API_KEY=your-admin-key              # API authentication

# VLM Provider (choose one)
VLM_PROVIDER=gemini                        # gemini, openai, or anthropic
GEMINI_API_KEY=your-gemini-key            # If using Gemini
OPENAI_API_KEY=your-openai-key            # If using OpenAI
ANTHROPIC_API_KEY=your-anthropic-key      # If using Claude

# Optional: Rate Limiting
RATE_LIMIT_REQUESTS=100                    # Requests per window
RATE_LIMIT_WINDOW=60                       # Time window (seconds)

Analysis Modes

Basic - Simple video description

{
  "analysis": {
    "mode": "basic",
    "prompt": "Describe the activity"
  }
}

Timestamps - Find specific moments

{
  "analysis": {
    "mode": "timestamps",
    "find_timestamps": {
      "query": "when does someone wave",
      "find_all": true,
      "confidence_threshold": 0.7
    }
  }
}

Supported Connectors

Connector Description Config
RTSP IP camera streams username, password, transport (tcp/udp)
ONVIF Auto-discovery + PTZ username, password, profile_index
UDP UDP video receiver host, port, buffer_size
WebRTC Browser streams signaling_url, ice_servers

API Endpoints

POST   /v1/streams/create              Create stream
GET    /v1/streams/{id}/events         SSE event stream
GET    /v1/streams/{id}                Get status
DELETE /v1/streams/{id}                Stop stream
GET    /v1/streams                     List all streams
GET    /v1/streams/discover/onvif      Discover cameras
GET    /v1/streams/health              Health check

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Connector  β”‚ (RTSP/ONVIF/UDP/WebRTC)
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚ Frames
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚    RT-DETR   β”‚ (Object detection + motion tracking)
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚ Events (only motion/activity)
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Event Bufferβ”‚ (Collects frames during events)
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚ Complete Events
       β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β”‚                β”‚
       β–Ό                β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Storage  β”‚    β”‚    VLM   β”‚ (Gemini/Qwen/ObserveeVLM)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
                      β”‚
                      β–Ό
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚ SSE / Webhooksβ”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Innovation: Event-based processing analyzes only frames with detected motion/activity, reducing VLM API calls by 98% compared to frame-by-frame analysis.


πŸ“¦ Repository Layout

vlm-sdk/
β”œβ”€β”€ vlm/                        # Core SDK components
β”œβ”€β”€ api/                        # FastAPI service (routers, services, models)
β”œβ”€β”€ examples/                   # Sample scripts for RTSP/UDP/WebRTC usage
β”œβ”€β”€ docs/                       # Additional documentation
β”œβ”€β”€ mediamtx/                   # MediaMTX config for WebRTC/RTSP bridging
β”œβ”€β”€ output/                     # Example generated clips (safe to remove)
β”œβ”€β”€ pyproject.toml              # SDK packaging metadata
β”œβ”€β”€ requirements.txt            # Full dependency list for API/Docker
β”œβ”€β”€ Dockerfile                  # Reference container for the API
└── README.md

πŸ”§ Development

# Clone repository
git clone https://github.com/observee-ai/vlm-sdk.git
cd vlm-sdk

# Install with dev dependencies
pip install -e ".[dev]"

# Include API stack if you plan to run the server locally
pip install -r requirements.txt

# Run tests
pytest tests/

# Format code
black vlm/ api/
ruff check vlm/ api/

# Run API server (development)
uvicorn api.main:app --reload

🎯 Use Cases

  • 🏒 Security & Surveillance: 24/7 perimeter monitoring with motion alerts
  • πŸͺ Retail Analytics: Customer counting, queue analysis, behavior tracking
  • πŸš— Traffic Monitoring: Vehicle counting, flow analysis, incident detection
  • 🏠 Smart Home: Activity monitoring, intrusion detection
  • 🏭 Industrial: Safety compliance, equipment monitoring

πŸ“Š Cost Comparison

Approach Frames/Hour VLM API Calls Cost Reduction
Frame-by-frame 54,000 (15 FPS) 54,000 Baseline
Event-based (VLMS) 54,000 ~1,000 98% βœ…

Example: 1-hour 15 FPS stream with 5-10 motion events


🀝 Contributing

Contributions welcome! Please see CONTRIBUTING.md for guidelines.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“„ License

Apache-2.0 – Permissive license suitable for commercial and open-source use.

See LICENSE for the complete text. Commercial support is available on request.


πŸ™ Acknowledgments

  • Ultralytics RT-DETR: Object detection and tracking
  • FastAPI: Modern Python web framework
  • Google Gemini: Video understanding API
  • Qwen API: Alternative Video Understanding API
  • ByteTrack: Multi-object tracking algorithm

Built with ❀️ for efficient video intelligence in SF

Releases

No releases published

Packages

No packages published