Open source AI video clipper for finding high-retention hooks in long videos.
This project is aimed at Jamaican drama, lifestyle, and patois-heavy content where transcription-first clipping tools often miss the real moment. It uses Gemini CLI with your signed-in Google account to rank hooks, then cuts, reframes, captions, and optionally exports thumbnails.
- Analyze local MP4 files with Gemini CLI.
- Score hooks across
hook,flow,value,trend, plus hot-hook metrics likeconflict,surprise,reaction, andpayoff. - Cut top clips with FFmpeg.
- Reframe to vertical 9:16.
- Burn in ASS captions.
- Use a two-pass flow: pick candidates first, then transcribe only the winning clip(s) for subtitles.
- Use Gemini word timings or an external source-video SRT.
- Export JPG thumbnails from the best frame per hook.
- Save full metadata to
hooks.json.
Opus Clip is strong on automated clipping, but the weak point for this use case is transcription quality on Jamaican speech. If the transcript is wrong, the hook ranking and captions drift. This repo keeps the ranking stage video-first and adds an external-SRT path so captions can come from a better transcript when needed.
- Python 3.10+
- FFmpeg on PATH
- Gemini CLI installed and signed in with Google
cd C:\Users\User\code\viral-hook-extractor
pip install -r requirements.txt
geminiWhen Gemini CLI opens, choose Sign in with Google. If you have Google AI Pro or Google AI Ultra, sign in with that same Google account.
If you want stronger local subtitles and diarization, install the optional ASR stack in a repo-local venv:
cd C:\Users\User\code\viral-hook-extractor
python -m venv .venv
.venv\Scripts\python -m pip install --upgrade pip setuptools wheel
.venv\Scripts\python -m pip install torch==2.8.0+cu128 torchaudio==2.8.0+cu128 torchvision==0.23.0+cu128 --index-url https://download.pytorch.org/whl/cu128
.venv\Scripts\python -m pip install -r requirements-asr.txtpython clipper.py my_video.mp4
python clipper.py my_video.mp4 --clips 8 --length-preset medium --face-track
python clipper.py my_video.mp4 --captions-srt full_video.srt --save-thumbnails
python clipper.py my_video.mp4 --focus-prompt "prioritize arguments, reveals, and funny reactions"
python clipper.py my_video.mp4 --hook-profile hotpip install -r requirements.txt
python webapp.pyThen open http://127.0.0.1:5000 in your browser.
The web UI lets you:
- Upload a video.
- Upload an optional full-video SRT.
- Pick clip count and duration range.
- Add a focus prompt like Opus ClipAnything.
- Turn face tracking and thumbnail export on or off.
- Download clips, thumbnails, and
hooks.jsonafter the run finishes.
| Flag | Default | Description |
|---|---|---|
--clips N |
5 |
Number of clips to export |
--length-preset |
short |
short=30-60s, medium=1-3m, long=3-5m, custom=use min/max |
--min SECS |
20 |
Minimum clip length |
--max SECS |
60 |
Maximum clip length |
--face-track |
off | Use face detection for crop positioning |
--output DIR |
output |
Output folder |
--captions-srt PATH |
none | Use a source-video SRT for captions |
--focus-prompt TEXT |
none | Extra instructions for hook selection |
--hook-profile |
hot |
hot=maximum drama/reaction, balanced=general viral, story=cleaner arc |
--save-thumbnails |
off | Export a JPG thumbnail per selected clip |
output/
clip_01_virality84_dramatic.mp4
clip_02_virality78_funny.mp4
hooks.json
thumbnails/
clip_01_thumb.jpg
clip_02_thumb.jpg
Each hook in hooks.json includes:
{
"start": 142.3,
"end": 178.1,
"type": "dramatic",
"hook_score": 22,
"flow_score": 21,
"value_score": 20,
"trend_score": 19,
"conflict_score": 24,
"surprise_score": 22,
"reaction_score": 23,
"payoff_score": 21,
"context_penalty": 4,
"virality_score": 82,
"selection_score": 88,
"reason": "The confrontation escalates immediately and resolves cleanly.",
"hook_line": "HIM NEVER EXPECT THIS",
"thumbnail_time": 155.4,
"transcript": [["wah", 142.3], ["yuh", 142.6]]
}Default mode uses Gemini CLI in a second pass on the selected clip to build ASS karaoke captions.
If you already have a better transcript, pass --captions-srt full_video.srt. The tool will slice that full-video subtitle file to match each exported clip, which is the better path when Jamaican speech recognition quality matters more than convenience.
- Hook scoring is prompt-based, not learned from your own performance history.
- Thumbnail selection is a still-frame export, not full thumbnail design.
- Face tracking is simple clip-level positioning, not continuous smart reframing.
- Gemini word timings are approximate, so external SRT is still the safest caption path.
- Add local ASR fallback with
faster-whisperorWhisperX. - Add thumbnail ranking that checks faces, motion, and sharpness.
- Add optional transcript correction dictionary for recurring patois names and slang.
- Add a scoring feedback loop from your actual post performance.