Skip to content

phlx0/viralclip

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ViralClip

PyPI version Python Version License: MIT CI

Extract the most engaging clips from any video — zero API keys, runs 100% locally.


Install

pip install viralclip

Optional extras:

pip install "viralclip[captions]"   # burn-in subtitles via local Whisper
pip install "viralclip[smartcrop]"  # face-detect crop via opencv
pip install "viralclip[all]"        # everything

System dependency: ffmpeg must be on your PATH.

brew install ffmpeg        # macOS
sudo apt install ffmpeg    # Ubuntu/Debian

Usage

# Most engaging 60s from a local file
viralclip clip video.mp4

# Download from YouTube and clip
viralclip clip https://youtube.com/watch?v=xxxxx

# Burn in captions (local Whisper, no API)
viralclip clip video.mp4 --captions

# Export 3 non-overlapping clips
viralclip clip video.mp4 --count 3

# Custom duration
viralclip clip video.mp4 --duration 45

# Output format
viralclip clip video.mp4 --format horizontal   # 16:9 YouTube
viralclip clip video.mp4 --format square        # 1:1 Instagram feed
viralclip clip video.mp4 --format portrait      # 4:5 Instagram portrait
viralclip clip video.mp4 --format vertical      # 9:16 TikTok/Reels (default)

# Smart crop — detect face, crop around subject
viralclip clip video.mp4 --smart-crop

# Preview each clip in system player after export
viralclip clip video.mp4 --preview

# See timestamps without exporting anything
viralclip clip video.mp4 --dry-run

# Custom output location and filename
viralclip clip video.mp4 --output-dir ./clips --output-name my-clip

# Suppress all output (CI/scripting)
viralclip clip video.mp4 --quiet

# Nudge window ±N seconds after dry-run preview
viralclip clip video.mp4 --dry-run          # see timestamps
viralclip clip video.mp4 --offset 8         # shift forward 8s
viralclip clip video.mp4 --offset -5        # shift back 5s

Config file

Persist defaults in ~/.config/viralclip/config.toml so you don't repeat flags every run:

[defaults]
duration = 45
format = "vertical"
output_dir = "~/clips"
smart_crop = true
count = 3

Output formats

Flag Ratio Resolution Platform
vertical (default) 9:16 1080×1920 TikTok, Reels, Shorts
horizontal 16:9 1920×1080 YouTube, Twitter/X
square 1:1 1080×1080 Instagram feed
portrait 4:5 1080×1350 Instagram portrait

How the algorithm works

1. YouTube heatmap (when available)

For YouTube URLs, yt-dlp fetches the Most Replayed heatmap — real viewer replay density from millions of views. Each timestamp gets a score 0–1 representing how often that moment was rewatched. When present this carries 65% of the final score.

2. Audio analysis (always runs — 35% with heatmap, 100% without)

Five features computed per second via librosa. Weights adapt automatically based on content type:

Feature Speech weight Music weight What it captures
RMS energy 35% 20% Loudness / presence
Spectral flux 15% 35% Rate of change — beat drops, cuts
Onset strength 20% 30% Word density, musical onsets
Zero Crossing Rate 30% 15% Consonants, high-freq activity
Crowd reactions +15% bonus +15% bonus Applause / laughter bursts

Content type is detected via spectral flatness (low = tonal/music, high = noisy/speech). Weights interpolate smoothly between speech and music profiles. Silent seconds (< 5% peak RMS) are penalised 10×.

3. Quality filtering

Before window selection, two passes over the video:

  • Scene cut detection — ffmpeg finds hard cuts (scene score > 0.3); window starts snap to the nearest cut within ±3s so clips don't open mid-cut
  • Black frame detection — windows with > 10% black frames are skipped entirely

4. Smoothing + peak selection

Scores are Gaussian-smoothed (σ=2.5s). Then:

  1. Find local score peaks (scipy.signal.find_peaks, min distance = window/2)
  2. Place window 40% before / 60% after the peak — buildup + payoff
  3. Score each window as 0.7 × mean + 0.3 × peak
  4. Prefer window starts that follow a natural dip (breath before the moment)
  5. For --count N, greedily pick non-overlapping peaks

5. Crop

Center-crop to target ratio by default. With --smart-crop, OpenCV Haar cascade samples 3 frames (at 25%, 50%, 75% through the clip), detects faces in each, and uses the median face center — more robust than single-frame sampling.


Feature comparison

Feature ViralClip OpusClip ViralCutter
No API key needed
Works fully offline
YouTube heatmap signal ✅ (cloud)
Local captions (Whisper) Cloud Cloud
Face-aware crop
Multiple output formats
YouTube download
Multi-clip export
Dry run / preview
Free Freemium Freemium

License

MIT

About

Extract the most engaging clips from any video — zero API keys, runs 100% locally.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages