Extract the most engaging clips from any video — zero API keys, runs 100% locally.
pip install viralclipOptional extras:
pip install "viralclip[captions]" # burn-in subtitles via local Whisper
pip install "viralclip[smartcrop]" # face-detect crop via opencv
pip install "viralclip[all]" # everythingSystem dependency: ffmpeg must be on your PATH.
brew install ffmpeg # macOS
sudo apt install ffmpeg # Ubuntu/Debian# Most engaging 60s from a local file
viralclip clip video.mp4
# Download from YouTube and clip
viralclip clip https://youtube.com/watch?v=xxxxx
# Burn in captions (local Whisper, no API)
viralclip clip video.mp4 --captions
# Export 3 non-overlapping clips
viralclip clip video.mp4 --count 3
# Custom duration
viralclip clip video.mp4 --duration 45
# Output format
viralclip clip video.mp4 --format horizontal # 16:9 YouTube
viralclip clip video.mp4 --format square # 1:1 Instagram feed
viralclip clip video.mp4 --format portrait # 4:5 Instagram portrait
viralclip clip video.mp4 --format vertical # 9:16 TikTok/Reels (default)
# Smart crop — detect face, crop around subject
viralclip clip video.mp4 --smart-crop
# Preview each clip in system player after export
viralclip clip video.mp4 --preview
# See timestamps without exporting anything
viralclip clip video.mp4 --dry-run
# Custom output location and filename
viralclip clip video.mp4 --output-dir ./clips --output-name my-clip
# Suppress all output (CI/scripting)
viralclip clip video.mp4 --quiet
# Nudge window ±N seconds after dry-run preview
viralclip clip video.mp4 --dry-run # see timestamps
viralclip clip video.mp4 --offset 8 # shift forward 8s
viralclip clip video.mp4 --offset -5 # shift back 5sPersist defaults in ~/.config/viralclip/config.toml so you don't repeat flags every run:
[defaults]
duration = 45
format = "vertical"
output_dir = "~/clips"
smart_crop = true
count = 3| Flag | Ratio | Resolution | Platform |
|---|---|---|---|
vertical (default) |
9:16 | 1080×1920 | TikTok, Reels, Shorts |
horizontal |
16:9 | 1920×1080 | YouTube, Twitter/X |
square |
1:1 | 1080×1080 | Instagram feed |
portrait |
4:5 | 1080×1350 | Instagram portrait |
For YouTube URLs, yt-dlp fetches the Most Replayed heatmap — real viewer replay density from millions of views. Each timestamp gets a score 0–1 representing how often that moment was rewatched. When present this carries 65% of the final score.
Five features computed per second via librosa. Weights adapt automatically based on content type:
| Feature | Speech weight | Music weight | What it captures |
|---|---|---|---|
| RMS energy | 35% | 20% | Loudness / presence |
| Spectral flux | 15% | 35% | Rate of change — beat drops, cuts |
| Onset strength | 20% | 30% | Word density, musical onsets |
| Zero Crossing Rate | 30% | 15% | Consonants, high-freq activity |
| Crowd reactions | +15% bonus | +15% bonus | Applause / laughter bursts |
Content type is detected via spectral flatness (low = tonal/music, high = noisy/speech). Weights interpolate smoothly between speech and music profiles. Silent seconds (< 5% peak RMS) are penalised 10×.
Before window selection, two passes over the video:
- Scene cut detection — ffmpeg finds hard cuts (scene score > 0.3); window starts snap to the nearest cut within ±3s so clips don't open mid-cut
- Black frame detection — windows with > 10% black frames are skipped entirely
Scores are Gaussian-smoothed (σ=2.5s). Then:
- Find local score peaks (
scipy.signal.find_peaks, min distance = window/2) - Place window 40% before / 60% after the peak — buildup + payoff
- Score each window as
0.7 × mean + 0.3 × peak - Prefer window starts that follow a natural dip (breath before the moment)
- For
--count N, greedily pick non-overlapping peaks
Center-crop to target ratio by default. With --smart-crop, OpenCV Haar cascade samples 3 frames (at 25%, 50%, 75% through the clip), detects faces in each, and uses the median face center — more robust than single-frame sampling.
| Feature | ViralClip | OpusClip | ViralCutter |
|---|---|---|---|
| No API key needed | ✅ | ❌ | ❌ |
| Works fully offline | ✅ | ❌ | ❌ |
| YouTube heatmap signal | ✅ | ✅ (cloud) | ❌ |
| Local captions (Whisper) | ✅ | Cloud | Cloud |
| Face-aware crop | ✅ | ✅ | ❌ |
| Multiple output formats | ✅ | ✅ | ❌ |
| YouTube download | ✅ | ✅ | ✅ |
| Multi-clip export | ✅ | ✅ | ✅ |
| Dry run / preview | ✅ | ❌ | ❌ |
| Free | ✅ | Freemium | Freemium |
MIT