ViralClip

Extract the most engaging clips from any video — zero API keys, runs 100% locally.

Install

pip install viralclip

Optional extras:

pip install "viralclip[captions]"   # burn-in subtitles via local Whisper
pip install "viralclip[smartcrop]"  # face-detect crop via opencv
pip install "viralclip[all]"        # everything

System dependency: ffmpeg must be on your PATH.

brew install ffmpeg        # macOS
sudo apt install ffmpeg    # Ubuntu/Debian

Usage

# Most engaging 60s from a local file
viralclip clip video.mp4

# Download from YouTube and clip
viralclip clip https://youtube.com/watch?v=xxxxx

# Burn in captions (local Whisper, no API)
viralclip clip video.mp4 --captions

# Export 3 non-overlapping clips
viralclip clip video.mp4 --count 3

# Custom duration
viralclip clip video.mp4 --duration 45

# Output format
viralclip clip video.mp4 --format horizontal   # 16:9 YouTube
viralclip clip video.mp4 --format square        # 1:1 Instagram feed
viralclip clip video.mp4 --format portrait      # 4:5 Instagram portrait
viralclip clip video.mp4 --format vertical      # 9:16 TikTok/Reels (default)

# Smart crop — detect face, crop around subject
viralclip clip video.mp4 --smart-crop

# Preview each clip in system player after export
viralclip clip video.mp4 --preview

# See timestamps without exporting anything
viralclip clip video.mp4 --dry-run

# Custom output location and filename
viralclip clip video.mp4 --output-dir ./clips --output-name my-clip

# Suppress all output (CI/scripting)
viralclip clip video.mp4 --quiet

# Nudge window ±N seconds after dry-run preview
viralclip clip video.mp4 --dry-run          # see timestamps
viralclip clip video.mp4 --offset 8         # shift forward 8s
viralclip clip video.mp4 --offset -5        # shift back 5s

Config file

Persist defaults in ~/.config/viralclip/config.toml so you don't repeat flags every run:

[defaults]
duration = 45
format = "vertical"
output_dir = "~/clips"
smart_crop = true
count = 3

Output formats

Flag	Ratio	Resolution	Platform
`vertical` (default)	9:16	1080×1920	TikTok, Reels, Shorts
`horizontal`	16:9	1920×1080	YouTube, Twitter/X
`square`	1:1	1080×1080	Instagram feed
`portrait`	4:5	1080×1350	Instagram portrait

How the algorithm works

1. YouTube heatmap (when available)

For YouTube URLs, yt-dlp fetches the Most Replayed heatmap — real viewer replay density from millions of views. Each timestamp gets a score 0–1 representing how often that moment was rewatched. When present this carries 65% of the final score.

2. Audio analysis (always runs — 35% with heatmap, 100% without)

Five features computed per second via librosa. Weights adapt automatically based on content type:

Feature	Speech weight	Music weight	What it captures
RMS energy	35%	20%	Loudness / presence
Spectral flux	15%	35%	Rate of change — beat drops, cuts
Onset strength	20%	30%	Word density, musical onsets
Zero Crossing Rate	30%	15%	Consonants, high-freq activity
Crowd reactions	+15% bonus	+15% bonus	Applause / laughter bursts

Content type is detected via spectral flatness (low = tonal/music, high = noisy/speech). Weights interpolate smoothly between speech and music profiles. Silent seconds (< 5% peak RMS) are penalised 10×.

3. Quality filtering

Before window selection, two passes over the video:

Scene cut detection — ffmpeg finds hard cuts (scene score > 0.3); window starts snap to the nearest cut within ±3s so clips don't open mid-cut
Black frame detection — windows with > 10% black frames are skipped entirely

4. Smoothing + peak selection

Scores are Gaussian-smoothed (σ=2.5s). Then:

Find local score peaks (scipy.signal.find_peaks, min distance = window/2)
Place window 40% before / 60% after the peak — buildup + payoff
Score each window as 0.7 × mean + 0.3 × peak
Prefer window starts that follow a natural dip (breath before the moment)
For --count N, greedily pick non-overlapping peaks

5. Crop

Center-crop to target ratio by default. With --smart-crop, OpenCV Haar cascade samples 3 frames (at 25%, 50%, 75% through the clip), detects faces in each, and uses the median face center — more robust than single-frame sampling.

Feature comparison

Feature	ViralClip	OpusClip	ViralCutter
No API key needed	✅	❌	❌
Works fully offline	✅	❌	❌
YouTube heatmap signal	✅	✅ (cloud)	❌
Local captions (Whisper)	✅	Cloud	Cloud
Face-aware crop	✅	✅	❌
Multiple output formats	✅	✅	❌
YouTube download	✅	✅	✅
Multi-clip export	✅	✅	✅
Dry run / preview	✅	❌	❌
Free	✅	Freemium	Freemium

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
src/viralclip		src/viralclip
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ViralClip

Install

Usage

Config file

Output formats

How the algorithm works

1. YouTube heatmap (when available)

2. Audio analysis (always runs — 35% with heatmap, 100% without)

3. Quality filtering

4. Smoothing + peak selection

5. Crop

Feature comparison

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ViralClip

Install

Usage

Config file

Output formats

How the algorithm works

1. YouTube heatmap (when available)

2. Audio analysis (always runs — 35% with heatmap, 100% without)

3. Quality filtering

4. Smoothing + peak selection

5. Crop

Feature comparison

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages