One‑line pitch: Drop a meeting video in your terminal and get back a clean, speaker‑labelled transcript plus Markdown notes — all with free, open‑source models you can run locally in under an hour.
| Feature | Status | Notes |
|---|---|---|
| High‑quality multilingual transcription (OpenAI Whisper) | ✅ | Runs on CPU or GPU |
| Automatic speaker diarization (pyannote.audio) | ✅ | Distinguishes who spoke |
| Markdown export with timestamps | ✅ | Saves to results/transcript.md |
One‑command CLI (python main.py <video>) |
✅ | Creates a results/ folder |
| Key‑frame extraction for slide changes (LMSKE) | ⏳ | Planned v0.2 |
| Topic summarisation & action‑items (LLM) | ⏳ | Planned v0.3 |
Prereqs: Python 3.10+,
ffmpeg(≥ 4.2).
# 1. Clone & enter
git clone https://github.com/JuanLara18/Meeting-Scribe.git
cd meetingscribe
# 2. Create env & install deps
python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r config/requirements.txt
# 3. Run on a sample video
python main.py path/to/meeting.mp4Output:
results/
├── transcript.md # speaker‑segmented Markdown
└── transcript.json # raw structured data
usage: python main.py [-h] [--lang en] [--model base] VIDEO_PATHVIDEO_PATH– any local.mp4 /.mkv /.movfile.--lang– ISO‑639‑1 code for forced language (default: auto‑detect).--model– whisper size (tinybasesmallmediumlarge-v3).
Example with Spanish audio, medium model:
python main.py reunión.mp4 --lang es --model mediummeetingscribe
├── main.py # orchestrates the end‑to‑end pipeline
├── processing/
│ ├── audio.py # audio extraction (ffmpeg)
│ ├── transcribe.py # whisper wrapper
│ ├── diarize.py # pyannote wrapper
│ └── merge.py # align speakers + text
├── utils/
│ └── markdown.py # export helpers
├── requirements.txt
├── README.md
└── results/ # auto‑created
Minimal today — but every component lives in its own module so we can swap models or add GPU acceleration without touching main.py.
| Layer | Tool | Why |
|---|---|---|
| ASR | OpenAI Whisper | State‑of‑the‑art, MIT license, offline |
| Diarization | pyannote.audio | SOTA pretrained pipelines |
| Media | ffmpeg |
battle‑tested extraction |
| Future vision | LMSKE (key‑frames), Llama‑3 local LLMs | keep everything free & private |
video.mp4
│
┌─────────▼─────────┐
│ 1. ffmpeg extract │──► audio.wav
└─────────┬─────────┘
│
┌─────────────▼─────────────┐
│ 2. Whisper ASR → segments │
└─────────────┬─────────────┘
│
┌─────────────▼─────────────┐
│ 3. pyannote diarize audio │
└─────────────┬─────────────┘
│
┌─────────────▼─────────────┐
│ 4. Merge text+speakers │
└─────────────┬─────────────┘
│
┌─────────────▼─────────────┐
│ 5. Export Markdown & JSON │
└───────────────────────────┘
- v0.2 – Key‑frame extraction ➜ embed screenshots in Markdown.
- v0.3 – Local LLM summariser ➜ bullet goals, action items.
- v1.0 – Real‑time streaming mode & simple web UI (FastAPI + React).
MIT — free for personal & commercial projects. Attribution welcome but not required.