TranscrIA

Self-hosted meeting transcription portal — turn long meeting recordings into usable deliverables on your own GPUs: corrected, speaker-attributed transcripts (SRT), structured summaries, quality reports and meeting-type-aware Word minutes. No cloud, no per-minute API bill, full data sovereignty.

🇫🇷 Interface et documentation actuellement en français — README français. The product is French-first today; UI strings and LLM prompts are centralized so localization is a planned evolution, not a rewrite.

Why TranscrIA

Plenty of scripts wrap Whisper. TranscrIA is built as a service for teams that process real meetings, week after week:

A real audio module, not an ffmpeg wrapper. Acoustic preflight (SNR, clipping, bandwidth, risk flags), speech/music/noise scene analysis, a per-window difficulty timeline shown to the user before transcription, optional Demucs source separation, loudness normalization, Silero VAD — all coordinated with GPU/VRAM management.
Human-in-the-loop where it matters. Detected speakers come with playable audio excerpts, talk time and an acoustic gender hint; users validate names, participants and a domain lexicon before the final pass. Known-voice matching is consent-based (signed form, hashed proof, source audio deleted by default).
LLM arbitration with guardrails. A local OpenAI-compatible LLM (e.g. llama.cpp) produces the structured summary, corrects the SRT using the validated lexicon and context, and a final review pass harmonizes the deliverables — with anti-hallucination cleanup, retry-then-fail-loud semantics, and prompts editable in the admin UI.
Production-grade orchestration. Persistent GPU job queue (priorities, anti-starvation aging, pause/resume, scheduled starts), VRAM-aware admission per remaining pipeline phase, calendar-based GPU scheduling, a resumable pipeline (checkpoint/resume — a re-queued job never redoes finished work), and "waiting for VRAM" as a first-class, admin-alerted state instead of a silent failure.
Three deployment topologies. All-in-one box; CPU-only web frontend + GPU worker (shared PostgreSQL, job files replicated through the database — no NFS to operate, sha256-verified integrity); and a remote inference node serving STT/diarization/voice-embedding over HTTP with VRAM autonomy (reuse → launch on demand → explicit 503).
Compliance by design. Multi-user RBAC (roles, groups), full GDPR audit trail (actor, IP, timestamp, filterable, exportable), consent-gated voice profiles, secrets kept out of the versioned config.

Screenshots

Home — jobs at a glance, one-click SRT / ZIP downloads

Processing profiles — pick your deliverable on a single slider right after upload; the portal pre-selects the most complete profile your hardware can run and hides the steps it doesn't need

Speaker validation — listen to excerpts, name speakers, acoustic gender hints

Configuration — detected hardware, friendly forms, LLM prompts editable in-app, full YAML for experts

GPU scheduling & queue — calendar windows (block night starts, cap concurrency), persistent queue with priorities

How it works

upload ─► audio diagnosis ─► quick summary (STT + LLM) ─► context, participants,
   lexicon (human validation) ─► final pipeline:
   preprocess → transcription → diarization → LLM correction → final review
   → quality scoring → exports (SRT, segments, quality report, DOCX minutes, ZIP)

STT backends (interchangeable): Cohere transcribe (default), Whisper large-v3 / faster-whisper, IBM Granite Speech, NVIDIA Parakeet TDT (experimental) — served locally or by a remote OpenAI-compatible server (vLLM, SGLang…).
Diarization backends: pyannote.audio (default) or NVIDIA Sortformer via NeMo.
Word minutes adapted to 18 meeting types (works council, executive committee, project review, crisis…): LLM-extracted decisions/actions/votes, type-specific fields and visual themes, graceful degradation if extraction fails.
Processing profiles (after upload): pick a deliverable on a single slider — from a quick SRT express to a full dossier qualité — instead of an opaque fast/quality switch. The portal greys out profiles your hardware can't run, pre-selects the most complete one that fits, and then only executes the pipeline phases (and only reserves the GPU/LLM) that the chosen profile actually needs.
Every phase is checkpointed: a re-dispatched job resumes at the first incomplete phase, even on a different worker.

Quickstart

git clone https://github.com/Martossien/transcria.git
cd transcria
./install.sh            # venv, dependencies, CUDA-matched PyTorch, config.yaml, optional systemd unit

Arbitration LLM, auto-selected by VRAM. During install, TranscrIA detects your GPUs and recommends the largest tier that actually fits (12 / 16 / 24 / 32 / 48 / 64 GB) — by real per-card placement (mono or split), not by total VRAM — and offers to download the right GGUF (with your HF token) and activate it — one prompt, no manual model-picking. Below 12 GB it falls back to raw transcription (no correction/summary LLM). The per-tier models are benchmarked in docs/BENCH_LLM_PALIERS.md; switch anytime with scripts/switch_arbitrage_llm.sh <tier>.

Still bring your own STT weights and pyannote cache (see docs/INSTALL.md), fill in config.yaml, then validate the install with the built-in preflight — no GPU needed, no side effects:

venv/bin/python scripts/doctor.py            # config, DB schema, LLM server, opencode, nodes, storage
venv/bin/python scripts/doctor.py --strict   # warnings become failures (for deployment gates)

Start the service (./start.sh or systemd) and open the web UI. For distributed setups (web frontend + GPU worker, remote inference node), see docs/INSTALL.md §11–13 and docs/STOCKAGE_PARTAGE_JOBS.md.

…or run it with Docker (one command)

Prefer containers? A turnkey script takes you from clone to a running stack — host GPU setup, secret/config generation, image build, docker compose up, health check — with no manual steps:

scripts/docker_quickstart.sh --bundled        # EASIEST: try it — models baked in, no token, no download
scripts/docker_quickstart.sh                  # all-in-one GPU → http://localhost:7870
HF_TOKEN=hf_xxx scripts/docker_quickstart.sh  # reference quality (gated Cohere STT + pyannote); omit for the no-token path
scripts/docker_quickstart.sh --cpu            # no GPU (web + scheduler)
scripts/docker_quickstart.sh --down           # stop

Default login: open http://localhost:7870 and sign in with admin / CHANGE-ME (the initial credentials in the generated config.yaml, key auth.first_admin_password). Change the password before any real use — it's a placeholder, and a warning is logged while it stays at its default.

Just want to try it? One command, no token (`--bundled`)

scripts/docker_quickstart.sh --bundled is the friendliest way to evaluate the project: it pulls (or builds) the :bundled image with the default models already baked in — so there's no Hugging Face token, no download, and it even works offline. You only need an NVIDIA GPU (compute capability ≥ 7.5 — RTX 20xx or newer — and ≥ 12 GB VRAM) with Docker GPU access; the script checks this up front and stops with a clear message if your card is too small.

⚠️ This is a quick-test image, not the complete project. To keep it token-free it uses the entry-level engines: transcription via Whisper, diarization via NVIDIA Sortformer (≤ 4 speakers, experimental), and the smallest 9B arbitration LLM. It exercises the full 6-profile workflow (summary / correction / review all run), but not the reference quality. For that — Cohere STT + pyannote (unlimited speakers) and larger LLM tiers — provide a free HF_TOKEN (after accepting both models' conditions) or set TRANSCRIA_LLM_TIER. Nothing to reconfigure: the same command picks them up.

The all-in-one GPU image (CUDA 12.6) bundles the whole pipeline — STT, diarization and the arbitration LLM (a compiled llama-server serving a small non-gated GGUF). The default :latest (slim) image downloads those models at first run; :bundled bakes them in (zero download, no host-cache pitfalls — see the slim-vs-bundled table in the docs). With no token at all, the full 6-profile workflow runs (speaker labels via NVIDIA Sortformer, ≤4 speakers); a free HF token (plus accepting both model conditions) switches to reference quality (Cohere + pyannote, unlimited speakers). The slim image bakes no weights and is publishable — point the quickstart at a published image (TRANSCRIA_ALLINONE_IMAGE=ghcr.io/<owner>/transcria-allinone:vX) to pull instead of build. It is idempotent (never overwrites an existing config.yaml/.env) and validated end-to-end on GPU. GPU requirement: NVIDIA compute capability ≥ 7.5 (Turing or newer — RTX 20xx→50xx, A/L/H-series; Blackwell via PTX JIT) and ≥ 12 GB VRAM (the default 9B LLM peaks ~10.6 GB; phases are sequenced, not additive). Full reference — image, compose, GPU/VRAM compatibility table, variables, publishing, rollback — in docs/DOCKER.md.

Tech stack

Layer	Technology
Backend	Python 3.11+, Flask 3, SQLAlchemy + Alembic (PostgreSQL in production, SQLite for dev)
STT serving	vLLM / SGLang / any OpenAI-compatible server; local engines
Diarization & voice	pyannote.audio, NVIDIA NeMo (Sortformer), local voice embeddings
LLM phases	opencode driving a local OpenAI-compatible LLM — selectable backend: Ollama / llama.cpp / vLLM, chosen per hardware from a data-driven profile catalog (docs/LLM_BACKENDS.md)
Audio	ffmpeg/ffprobe, Demucs, Silero VAD, SQUIM / DNSMOS quality metrics
Frontend	Server-rendered Jinja2 + Bootstrap 5, vanilla JS
Exports	python-docx (themed minutes), SRT, JSON, ZIP package

Project status

⚠️ Beta — latest release: v0.1.0-beta.7. New in beta.7: the arbitration LLM is now multi-backend — Ollama / llama.cpp / vLLM — chosen automatically from the physical hardware via a single data-driven profile catalog (transcria/data/llm_profiles.yaml, no model size hardcoded; VRAM footprint derived from the real model and re-measured on first load). Ollama becomes the "easy" backend (curl | sh, no compilation, no nvcc, no HF token); llama.cpp gains a prebuilt CUDA binary path (ai-dock, sha256-verified — installs on a blank distro without compiling); see docs/LLM_BACKENDS.md. The product is functional and covered by 2,983 tests (green CI: ruff, mypy, full pytest on PostgreSQL, ~80 % coverage). The installer is validated end-to-end on Ubuntu 22.04/24.04, Debian 12, Fedora 41, Rocky 9 × Python 3.11–3.13 (apt + dnf), full pipeline STT + diarization + LLM, across mono- and multi-GPU (Ollama 12B/35B, llama.cpp 35B-A3B, vLLM 27B-FP8) — see docs/LLM_PROFILS_VALIDATION.md. The distributed topology (CPU frontend + GPU resource node) is validated end-to-end on real audio with a vLLM arbitration LLM (tensor-parallel) and automatic VRAM placement across 8 GPUs — see docs/DOCKER.md and docs/PLAN_TEST_SPLIT_VLLM.md. Concurrency hardened under load (robust up to 8 concurrent jobs — see docs/PLAN_TEST_CHARGE.md). Following SemVer, the 0.x series is a stabilization phase: the API, the configuration schema and the data model may still change without backward-compatibility guarantees until 1.0.0. Evaluate it, pilot it — don't bet production on it without your own validation. A containerized deployment (Dockerfile, compose, GPU support, turnkey quickstart) is available — see docs/DOCKER.md.

Language: the UI and the LLM prompts are French-first (the pipeline is tuned for French meetings). Both are centralized/editable, so adding languages is a planned evolution, not a rewrite.

Documentation

Full documentation lives in docs/ (currently in French):

Document	Content
docs/INSTALL.md	Installation, models, systemd, troubleshooting, distributed deployment
docs/DOCKER.md	Containerized deployment — turnkey quickstart, image, compose, GPU (CDI), variables, rollback
docs/TECHNICAL.md	Architecture, pipeline, API, GPU orchestration
docs/CONFIG_REFERENCE.md	Complete `config.yaml` reference
docs/DATA_MODEL.md	DB schema, job states, files per job
docs/SERVICE_RESSOURCES_GPU.md	Remote inference, VRAM autonomy, degraded modes
docs/STOCKAGE_PARTAGE_JOBS.md	PostgreSQL-backed job file store for split deployments
CONTRIBUTING.md · SECURITY.md · CHANGELOG.md	Contributing, security policy, changelog

License

TranscrIA is released under the Apache License 2.0. Third-party components (bundled libraries and binaries, and runtime-downloaded models) and their licenses / attributions are listed in THIRD_PARTY_NOTICES.md — including the CC-BY-4.0 attribution for the DNSMOS/SQUIM quality models, and the licenses of components shipped in the Docker images (opencode — MIT, ffmpeg — GPL/LGPL via Debian, etc.). No GPL/AGPL (strong copyleft) dependency is present at runtime.

Name		Name	Last commit message	Last commit date
Latest commit History 583 Commits
.github		.github
alembic		alembic
configs		configs
deploy		deploy
docs		docs
inference_service		inference_service
licenses		licenses
models		models
scripts		scripts
tests		tests
transcria		transcria
.dockerignore		.dockerignore
.env.docker.example		.env.docker.example
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Dockerfile.allinone-bundled		Dockerfile.allinone-bundled
Dockerfile.allinone-gpu		Dockerfile.allinone-gpu
Dockerfile.resource-node		Dockerfile.resource-node
Dockerfile.worker		Dockerfile.worker
LICENSE		LICENSE
README.fr.md		README.fr.md
README.md		README.md
SECURITY.md		SECURITY.md
THIRD_PARTY_NOTICES.md		THIRD_PARTY_NOTICES.md
alembic.ini		alembic.ini
app.py		app.py
config.example.yaml		config.example.yaml
config.frontale.example.yaml		config.frontale.example.yaml
config.resource-node.example.yaml		config.resource-node.example.yaml
config.split.example.yaml		config.split.example.yaml
docker-compose.split-gpu.yml		docker-compose.split-gpu.yml
docker-compose.yml		docker-compose.yml
install.sh		install.sh
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements-freeze.txt		requirements-freeze.txt
requirements.txt		requirements.txt
start.sh		start.sh
status.sh		status.sh
stop.sh		stop.sh
transcria.service		transcria.service
wsgi.py		wsgi.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TranscrIA

Why TranscrIA

Screenshots

How it works

Quickstart

…or run it with Docker (one command)

Just want to try it? One command, no token (`--bundled`)

Tech stack

Project status

Documentation

License

About

Uh oh!

Releases 7

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TranscrIA

Why TranscrIA

Screenshots

How it works

Quickstart

…or run it with Docker (one command)

Just want to try it? One command, no token (--bundled)

Tech stack

Project status

Documentation

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Just want to try it? One command, no token (`--bundled`)

Packages