Speech Tech Index

Index of speech technology repositories, tools, and resources — covering the full pipeline from speech capture through transcription, cleanup, and text transformation.

Last updated: 2026-03-25

ASR Fine-Tuning

Resources, scripts, and models for fine-tuning automatic speech recognition systems.

Modal ACFT Finetune Script

Validated Whisper fine-tuning script on Modal for FUTO

Modal Whisper Finetune Script

Validated fine-tuning script for fine-tuning Whisper on Modal GPU with a preformatted audio dataset

My Whisper ACFT Fine-Tunes (Collection)

Collection of fine-tuned Whisper models specifically for FUTO Keyboard on mobile. Fine-tuned on ~1 hour of personal voice samples.

Whisper ACFT - Base

Base-sized Whisper fine-tune

Whisper ACFT - Small

Small-sized Whisper fine-tune

Whisper ACFT - Tiny

Tiny-sized Whisper fine-tune

My Whisper Fine-Tunes V2 (Collection)

Collection of general Whisper fine-tuned models for desktop use, available in GGML and CTranslate2 formats. Fine-tuned on ~1 hour of personal voice samples.

Whisper Fine-Tune - Large V3 Turbo

Large V3 Turbo-sized Whisper fine-tune

Whisper Fine-Tune - Medium

Medium-sized Whisper fine-tune

Whisper Fine-Tune - Tiny

Tiny-sized Whisper fine-tune

Whisper Fine-Tune - Base

Base-sized Whisper fine-tune

STT Fine Tune Project Outline

Planning doc for STT fine-tuning and eval project

whisper-acft

Whisper ACFT fine-tuning

Whisper Fine Tuning Resources

Some resources for those looking to fine-tune Whisper ASR

Whisper-Hebrish

Fine-tuned Whisper model for Hebrew/English mixed speech

ASR Training Data GUIs

GUI applications for creating and collecting training data for ASR fine-tuning.

ASR Training Data Chunker

Breaks up texts by approximate reading duration

ASR Training Data Collector

GUI to facilitate gathering training data for ASR/STT apps in organised datasets with audio capture, text capture, and JSONL metadata construction. Supports LLM-generated text and user-provided.

Voice Training Data Creator

GUI to facilitate capturing voice data for TTS / voice clone training with LLM synthetic text generation and saving logic (Ubuntu Linux)

ASR Datasets

Curated datasets for training and evaluating ASR/STT models.

My Public Audio Datasets (Collection)

Collection of public audio datasets for speech recognition training and evaluation

English-Hebrew Mixed Sentences

Dataset of mixed English/Hebrew sentences for multilingual ASR training

Tech Audio Samples

Technical audio samples for STT evaluation

Whisper WPM Test

Dataset for testing words-per-minute recognition accuracy

STT Applications

Desktop applications and utilities for speech-to-text input.

Whisper-Based Linux Prototypes

Voice Prompt Editor

Streamlit app for capturing and editing prompts and system prompts

Voice Prompt Runner

Demo UI which parses and then runs audio prompts

Whisper Notepad For Linux

Notepad for Linux that uses OpenAI Whisper (API) and reformats dictated text

Whisper Notepad Simple

A Linux desktop utility for converting speech to text using the OpenAI Whisper API

Whisper Transcription Notepad Linux

Transcription notepad with cloud speech to text (STT) for Linux

Deepgram-Based Linux Prototypes

Deepgram Voice Keyboard

A fork of Deepgram's Linux starter. CLI -> GUI + hotkey support, API key editing, cost tracking. WIP

Deepgram Voice Keyboard Ubuntu

WIP to try to create a good STT utility with cloud STT APIs

Other STT & Dictation Apps

amical

Open Source AI Dictation App - Type 3x faster, no keyboard needed

Handy

A free, open source, and extensible speech-to-text application that works completely offline

hyprvoice

Voice-powered typing for Wayland/Hyprland desktops

parakeet-dictation

On-device voice typing for Linux using Parakeet and NeMo ASR models via sherpa-onnx

speech-notes-with-text-fixes

Speech Note Linux app. Note taking, reading and translating with offline STT, TTS and Machine translation

Thought-Pad

Linux desktop application for creating notes from dictated speech

Voice-Note-Recorder-Ubuntu

GUI for recording voice notes

Wayland-Voice-Typer

Simple GUI around whisper.cpp for voice-to-text on Linux

Multimodal Audio Transcription

AI-Transcription-Notepad

Voice note taking utility with cloud audio multimodal models for transcription and text cleanup

Cloud-ASR-MCP

WIP MCP for using various cloud ASR models for speech to text / transcription

DVR-Transcriber

Workflow workspace for importing recordings from a DVR and using AI for transcription

Gemini-Audio-Transcriber

File upload based multimodal transcription tool using Gemini

Gemini-Transcription-MCP

MCP for Gemini multimodal audio transcription with built in post-processing

Local-Multimodal-Transcriber

Local transcription app with audio multimodal design

Transcript Processing

System prompts and tools for cleaning, transforming, and enhancing STT output.

Basic STT Transcript Cleanup

Clean up raw speech-to-text transcripts

Diarised Transcript Assistant

System prompt for generating diarised transcripts (STT plus stylistic guidance)

Speech To Text System Prompt Library

An updated skeleton library of system prompts for using LLMs to refine STT output

STT Basic Cleanup System Prompt

Basic foundational system prompt for cleaning up AI voice transcripts

Text Magic Fix Linux

WIP/Idea - Select text and fix typos with local AI

Text Transformation Prompt Collection 2

An abbreviated collection of STT transformation prompts

Text Transformation Prompt Combiner

Basic implementation of a prompt concatenation utility for text transformation system prompts for converting transcribed text

Text Transformation Prompt Library

Updated repo of text transformation prompts (raw STT transcripts -> *). New repo for capturing via automations.

AI Text Rewriting Toolbox

LLM text reformatting and rewriting toolbox comprised of many system prompts

Audiopenai Edit Prompts

Text transformation prompts library for Audiopen.ai

Shakespearean Text Generators

System prompts for rewriting text in Shakespearean English

Text Cleanup Fine Tuning Set

Fine-tuning dataset/plans for text cleanup audio multimodal

Text Transformation Prompt Stack

Documentation/notes for a "prompt stack" for audio multimodal text processing

Transcription Cleanup Eval 1225

Evaluating various cloud audio understanding models on transcribe and cleanup

Voice Cleanup Prompt Experiment

Testing various permutations in system prompting for raw audio transcript cleanup

Voice Note Redaction Agent

Config for a text redaction agent for voicenote -> * workflows

Voice Automation & Pipelines

Workflows and agents for voice-to-action automation.

Audio Context Pipeline Model 0425

Planning repo for personalised AI context pipeline with revised tooling

STT To TTS

Gemini app which captures user speech, condenses (LLM), and then synthesises

Voice Prompt Enhancement Node

Configuration for an intermediate agent in voice automation workflows that bridge voice input to other actions

Voice Prompt Pipeline

Voice-to-prompt pipeline for processing spoken instructions

Voice Spec Driven Development Demo

Demonstrating a voice to text spec driven development workflow

Voice To Prompt Pipeline

A conceptual voice to prompt pipeline that attempts to separate instructions from provided context for better results

N8N Voice Note Context Pipeline Workflow

Workflow for extracting context data from voice notes to Pinecone

Voice Note Ragie Pipeline

Test pipeline: voice context data to Ragie

Voicenotes Prompt To Email Workflow N8N

Evaluation & Benchmarking

Tools for testing and comparing STT performance.

Local STT Eval One Sample

Single-sample evaluation for local STT models

Long Form Audio Eval

Single shot STT benchmark for long form audio

STT Comparison

Compare different speech-to-text models and services

STT Voice Note Evaluation

Local ASR STT Benchmark

Quick evaluation to find the best STT model in Speech Note (Ubuntu) for specific hardware

Long Form Audio Pipeline

Basic audio pipeline for preparing long audio content for ASR transcription

One Shot Transcription Microphone Eval

Test samples for various microphones with an STT accuracy eval

Speech And ASR Evaluations

Index repository for speech recognition and ASR evaluations

Whisper Fine-Tune Accuracy Eval

Comparing Whisper fine-tunes versus stock Whisper on local inference

Whisper Fine-Tune Eval

Evaluation interface for fine-tuned Whisper models

Whisper WPM Background Noise Eval

Quick eval: how much does speaking pace affect WER/accuracy in ASR?

Audio Processing

Microphone setup, EQ, and audio chain tools for optimal STT input.

Deepnet Baby Noise Scrub

Audio cleaning tool for removing baby/background noise from recordings

EQ Template Generator

Generate EQ templates for audio processing

Mic Input Boot FX Script Ubuntu

Boot script to ensure that Easy Effects manages the input sound source on boot (Ubuntu)

Speech Recognition Audio Chain

Attempt to set up a good autostart audio processing chain for STT

Voice Analyzer

Analyses voice data

TTS & Speech Synthesis

Text-to-speech and SSML generation tools.

Text To SSML Generator

Generates SSML from text by inference

Hebrew TTS Providers

Reference of Hebrew text-to-speech providers and services

General

Documentation, research, curated lists, and miscellaneous speech tech resources.

ASR And STT AI Notebook

Prompts and outputs (and some notes) on STT + ASR + fine-tuning. LLM: Claude

Awesome Whisper Apps

Useful speech to text tools that use Whisper under the hood (API/local)

Deepgram Text Input

Analysis of Deepgram text input

Linux Voice Typing App Notes

Planning notes for a tool I've been working on for a while!

STT Price Points 260225

Some timestamped API pricepoints for speech to text providers

Voice LLM App Notes

A few notes describing the kind of voice app for large language models I would love to have!

Dictation Macropad

Plan/key allocation for a macropad optimised for heavy daily dictation workflows

Linux Friendly Voice Tech

List of resources for voice technology with support for Linux

Speech To Text Chain Notes

Notes on STT processing chain (for future voice projects)

Ubuntu Mic Selector

Utility for switching microphone sources

Voice Control Linux

Claude-enhanced research for voice control platforms with Linux support

Voicepad

Planning notes for a macropad for STT users

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation