Index of speech technology repositories, tools, and resources — covering the full pipeline from speech capture through transcription, cleanup, and text transformation.
Last updated: 2026-03-25
Resources, scripts, and models for fine-tuning automatic speech recognition systems.
Validated Whisper fine-tuning script on Modal for FUTO
Validated fine-tuning script for fine-tuning Whisper on Modal GPU with a preformatted audio dataset
Collection of fine-tuned Whisper models specifically for FUTO Keyboard on mobile. Fine-tuned on ~1 hour of personal voice samples.
Base-sized Whisper fine-tune
Small-sized Whisper fine-tune
Tiny-sized Whisper fine-tune
Collection of general Whisper fine-tuned models for desktop use, available in GGML and CTranslate2 formats. Fine-tuned on ~1 hour of personal voice samples.
Large V3 Turbo-sized Whisper fine-tune
Medium-sized Whisper fine-tune
Tiny-sized Whisper fine-tune
Base-sized Whisper fine-tune
Planning doc for STT fine-tuning and eval project
Whisper ACFT fine-tuning
Some resources for those looking to fine-tune Whisper ASR
Fine-tuned Whisper model for Hebrew/English mixed speech
GUI applications for creating and collecting training data for ASR fine-tuning.
Breaks up texts by approximate reading duration
GUI to facilitate gathering training data for ASR/STT apps in organised datasets with audio capture, text capture, and JSONL metadata construction. Supports LLM-generated text and user-provided.
GUI to facilitate capturing voice data for TTS / voice clone training with LLM synthetic text generation and saving logic (Ubuntu Linux)
Curated datasets for training and evaluating ASR/STT models.
Collection of public audio datasets for speech recognition training and evaluation
Dataset of mixed English/Hebrew sentences for multilingual ASR training
Technical audio samples for STT evaluation
Dataset for testing words-per-minute recognition accuracy
Desktop applications and utilities for speech-to-text input.
Streamlit app for capturing and editing prompts and system prompts
Demo UI which parses and then runs audio prompts
Notepad for Linux that uses OpenAI Whisper (API) and reformats dictated text
A Linux desktop utility for converting speech to text using the OpenAI Whisper API
Transcription notepad with cloud speech to text (STT) for Linux
A fork of Deepgram's Linux starter. CLI -> GUI + hotkey support, API key editing, cost tracking. WIP
WIP to try to create a good STT utility with cloud STT APIs
Open Source AI Dictation App - Type 3x faster, no keyboard needed
A free, open source, and extensible speech-to-text application that works completely offline
Voice-powered typing for Wayland/Hyprland desktops
On-device voice typing for Linux using Parakeet and NeMo ASR models via sherpa-onnx
Speech Note Linux app. Note taking, reading and translating with offline STT, TTS and Machine translation
Linux desktop application for creating notes from dictated speech
GUI for recording voice notes
Simple GUI around whisper.cpp for voice-to-text on Linux
Voice note taking utility with cloud audio multimodal models for transcription and text cleanup
WIP MCP for using various cloud ASR models for speech to text / transcription
Workflow workspace for importing recordings from a DVR and using AI for transcription
File upload based multimodal transcription tool using Gemini
MCP for Gemini multimodal audio transcription with built in post-processing
Local transcription app with audio multimodal design
System prompts and tools for cleaning, transforming, and enhancing STT output.
Clean up raw speech-to-text transcripts
System prompt for generating diarised transcripts (STT plus stylistic guidance)
An updated skeleton library of system prompts for using LLMs to refine STT output
Basic foundational system prompt for cleaning up AI voice transcripts
WIP/Idea - Select text and fix typos with local AI
An abbreviated collection of STT transformation prompts
Basic implementation of a prompt concatenation utility for text transformation system prompts for converting transcribed text
Updated repo of text transformation prompts (raw STT transcripts -> *). New repo for capturing via automations.
LLM text reformatting and rewriting toolbox comprised of many system prompts
Text transformation prompts library for Audiopen.ai
System prompts for rewriting text in Shakespearean English
Fine-tuning dataset/plans for text cleanup audio multimodal
Documentation/notes for a "prompt stack" for audio multimodal text processing
Evaluating various cloud audio understanding models on transcribe and cleanup
Testing various permutations in system prompting for raw audio transcript cleanup
Config for a text redaction agent for voicenote -> * workflows
Workflows and agents for voice-to-action automation.
Planning repo for personalised AI context pipeline with revised tooling
Gemini app which captures user speech, condenses (LLM), and then synthesises
Configuration for an intermediate agent in voice automation workflows that bridge voice input to other actions
Voice-to-prompt pipeline for processing spoken instructions
Demonstrating a voice to text spec driven development workflow
A conceptual voice to prompt pipeline that attempts to separate instructions from provided context for better results
Workflow for extracting context data from voice notes to Pinecone
Test pipeline: voice context data to Ragie
Tools for testing and comparing STT performance.
Single-sample evaluation for local STT models
Single shot STT benchmark for long form audio
Compare different speech-to-text models and services
Quick evaluation to find the best STT model in Speech Note (Ubuntu) for specific hardware
Basic audio pipeline for preparing long audio content for ASR transcription
Test samples for various microphones with an STT accuracy eval
Index repository for speech recognition and ASR evaluations
Comparing Whisper fine-tunes versus stock Whisper on local inference
Evaluation interface for fine-tuned Whisper models
Quick eval: how much does speaking pace affect WER/accuracy in ASR?
Microphone setup, EQ, and audio chain tools for optimal STT input.
Audio cleaning tool for removing baby/background noise from recordings
Generate EQ templates for audio processing
Boot script to ensure that Easy Effects manages the input sound source on boot (Ubuntu)
Attempt to set up a good autostart audio processing chain for STT
Analyses voice data
Text-to-speech and SSML generation tools.
Generates SSML from text by inference
Reference of Hebrew text-to-speech providers and services
Documentation, research, curated lists, and miscellaneous speech tech resources.
Prompts and outputs (and some notes) on STT + ASR + fine-tuning. LLM: Claude
Useful speech to text tools that use Whisper under the hood (API/local)
Analysis of Deepgram text input
Planning notes for a tool I've been working on for a while!
Some timestamped API pricepoints for speech to text providers
A few notes describing the kind of voice app for large language models I would love to have!
Plan/key allocation for a macropad optimised for heavy daily dictation workflows
List of resources for voice technology with support for Linux
Notes on STT processing chain (for future voice projects)
Utility for switching microphone sources
Claude-enhanced research for voice control platforms with Linux support
Planning notes for a macropad for STT users