Skip to content

danielrosehill/Speech-Tech-Index

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 

Repository files navigation

Speech Tech Index

Part of the Daniel Rosehill Index Collection

Index of speech technology repositories, tools, and resources — covering the full pipeline from speech capture through transcription, cleanup, and text transformation.

Last updated: 2026-03-25


ASR Fine-Tuning

Resources, scripts, and models for fine-tuning automatic speech recognition systems.

Modal ACFT Finetune Script

Validated Whisper fine-tuning script on Modal for FUTO

GitHub

Modal Whisper Finetune Script

Validated fine-tuning script for fine-tuning Whisper on Modal GPU with a preformatted audio dataset

GitHub


My Whisper ACFT Fine-Tunes (Collection)

Collection of fine-tuned Whisper models specifically for FUTO Keyboard on mobile. Fine-tuned on ~1 hour of personal voice samples.

Hugging Face

Whisper ACFT - Base

Base-sized Whisper fine-tune

Hugging Face

Whisper ACFT - Small

Small-sized Whisper fine-tune

Hugging Face

Whisper ACFT - Tiny

Tiny-sized Whisper fine-tune

Hugging Face


My Whisper Fine-Tunes V2 (Collection)

Collection of general Whisper fine-tuned models for desktop use, available in GGML and CTranslate2 formats. Fine-tuned on ~1 hour of personal voice samples.

Hugging Face

Whisper Fine-Tune - Large V3 Turbo

Large V3 Turbo-sized Whisper fine-tune

Hugging Face

Whisper Fine-Tune - Medium

Medium-sized Whisper fine-tune

Hugging Face

Whisper Fine-Tune - Tiny

Tiny-sized Whisper fine-tune

Hugging Face

Whisper Fine-Tune - Base

Base-sized Whisper fine-tune

Hugging Face


STT Fine Tune Project Outline

Planning doc for STT fine-tuning and eval project

GitHub

whisper-acft

Whisper ACFT fine-tuning

GitHub

Whisper Fine Tuning Resources

Some resources for those looking to fine-tune Whisper ASR

GitHub

Whisper-Hebrish

Fine-tuned Whisper model for Hebrew/English mixed speech

Hugging Face


ASR Training Data GUIs

GUI applications for creating and collecting training data for ASR fine-tuning.

ASR Training Data Chunker

Breaks up texts by approximate reading duration

GitHub

ASR Training Data Collector

GUI to facilitate gathering training data for ASR/STT apps in organised datasets with audio capture, text capture, and JSONL metadata construction. Supports LLM-generated text and user-provided.

GitHub

Voice Training Data Creator

GUI to facilitate capturing voice data for TTS / voice clone training with LLM synthetic text generation and saving logic (Ubuntu Linux)

GitHub


ASR Datasets

Curated datasets for training and evaluating ASR/STT models.

My Public Audio Datasets (Collection)

Collection of public audio datasets for speech recognition training and evaluation

Hugging Face

English-Hebrew Mixed Sentences

Dataset of mixed English/Hebrew sentences for multilingual ASR training

Hugging Face

Tech Audio Samples

Technical audio samples for STT evaluation

Hugging Face

Whisper WPM Test

Dataset for testing words-per-minute recognition accuracy

Hugging Face


STT Applications

Desktop applications and utilities for speech-to-text input.

Whisper-Based Linux Prototypes

Voice Prompt Editor

Streamlit app for capturing and editing prompts and system prompts

GitHub

Voice Prompt Runner

Demo UI which parses and then runs audio prompts

GitHub

Whisper Notepad For Linux

Notepad for Linux that uses OpenAI Whisper (API) and reformats dictated text

GitHub

Whisper Notepad Simple

A Linux desktop utility for converting speech to text using the OpenAI Whisper API

GitHub

Whisper Transcription Notepad Linux

Transcription notepad with cloud speech to text (STT) for Linux

GitHub

Deepgram-Based Linux Prototypes

Deepgram Voice Keyboard

A fork of Deepgram's Linux starter. CLI -> GUI + hotkey support, API key editing, cost tracking. WIP

GitHub

Deepgram Voice Keyboard Ubuntu

WIP to try to create a good STT utility with cloud STT APIs

GitHub

Other STT & Dictation Apps

amical

Open Source AI Dictation App - Type 3x faster, no keyboard needed

GitHub

Handy

A free, open source, and extensible speech-to-text application that works completely offline

GitHub

hyprvoice

Voice-powered typing for Wayland/Hyprland desktops

GitHub

parakeet-dictation

On-device voice typing for Linux using Parakeet and NeMo ASR models via sherpa-onnx

GitHub

speech-notes-with-text-fixes

Speech Note Linux app. Note taking, reading and translating with offline STT, TTS and Machine translation

GitHub

Thought-Pad

Linux desktop application for creating notes from dictated speech

GitHub

Voice-Note-Recorder-Ubuntu

GUI for recording voice notes

GitHub

Wayland-Voice-Typer

Simple GUI around whisper.cpp for voice-to-text on Linux

GitHub

Multimodal Audio Transcription

AI-Transcription-Notepad

Voice note taking utility with cloud audio multimodal models for transcription and text cleanup

GitHub

Cloud-ASR-MCP

WIP MCP for using various cloud ASR models for speech to text / transcription

GitHub

DVR-Transcriber

Workflow workspace for importing recordings from a DVR and using AI for transcription

GitHub

Gemini-Audio-Transcriber

File upload based multimodal transcription tool using Gemini

GitHub

Gemini-Transcription-MCP

MCP for Gemini multimodal audio transcription with built in post-processing

GitHub

Local-Multimodal-Transcriber

Local transcription app with audio multimodal design

GitHub


Transcript Processing

System prompts and tools for cleaning, transforming, and enhancing STT output.

Basic STT Transcript Cleanup

Clean up raw speech-to-text transcripts

Hugging Face

Diarised Transcript Assistant

System prompt for generating diarised transcripts (STT plus stylistic guidance)

GitHub

Speech To Text System Prompt Library

An updated skeleton library of system prompts for using LLMs to refine STT output

GitHub

STT Basic Cleanup System Prompt

Basic foundational system prompt for cleaning up AI voice transcripts

GitHub

Text Magic Fix Linux

WIP/Idea - Select text and fix typos with local AI

GitHub

Text Transformation Prompt Collection 2

An abbreviated collection of STT transformation prompts

GitHub

Text Transformation Prompt Combiner

Basic implementation of a prompt concatenation utility for text transformation system prompts for converting transcribed text

GitHub

Text Transformation Prompt Library

Updated repo of text transformation prompts (raw STT transcripts -> *). New repo for capturing via automations.

GitHub

AI Text Rewriting Toolbox

LLM text reformatting and rewriting toolbox comprised of many system prompts

GitHub

Audiopenai Edit Prompts

Text transformation prompts library for Audiopen.ai

GitHub

Shakespearean Text Generators

System prompts for rewriting text in Shakespearean English

GitHub

Text Cleanup Fine Tuning Set

Fine-tuning dataset/plans for text cleanup audio multimodal

GitHub

Text Transformation Prompt Stack

Documentation/notes for a "prompt stack" for audio multimodal text processing

GitHub

Transcription Cleanup Eval 1225

Evaluating various cloud audio understanding models on transcribe and cleanup

GitHub

Voice Cleanup Prompt Experiment

Testing various permutations in system prompting for raw audio transcript cleanup

GitHub

Voice Note Redaction Agent

Config for a text redaction agent for voicenote -> * workflows

GitHub


Voice Automation & Pipelines

Workflows and agents for voice-to-action automation.

Audio Context Pipeline Model 0425

Planning repo for personalised AI context pipeline with revised tooling

GitHub

STT To TTS

Gemini app which captures user speech, condenses (LLM), and then synthesises

GitHub

Voice Prompt Enhancement Node

Configuration for an intermediate agent in voice automation workflows that bridge voice input to other actions

GitHub

Voice Prompt Pipeline

Voice-to-prompt pipeline for processing spoken instructions

Hugging Face

Voice Spec Driven Development Demo

Demonstrating a voice to text spec driven development workflow

GitHub

Voice To Prompt Pipeline

A conceptual voice to prompt pipeline that attempts to separate instructions from provided context for better results

GitHub

N8N Voice Note Context Pipeline Workflow

Workflow for extracting context data from voice notes to Pinecone

GitHub

Voice Note Ragie Pipeline

Test pipeline: voice context data to Ragie

GitHub

Voicenotes Prompt To Email Workflow N8N

GitHub


Evaluation & Benchmarking

Tools for testing and comparing STT performance.

Local STT Eval One Sample

Single-sample evaluation for local STT models

Hugging Face

Long Form Audio Eval

Single shot STT benchmark for long form audio

GitHub

STT Comparison

Compare different speech-to-text models and services

Hugging Face

STT Voice Note Evaluation

GitHub

Local ASR STT Benchmark

Quick evaluation to find the best STT model in Speech Note (Ubuntu) for specific hardware

GitHub

Long Form Audio Pipeline

Basic audio pipeline for preparing long audio content for ASR transcription

GitHub

One Shot Transcription Microphone Eval

Test samples for various microphones with an STT accuracy eval

GitHub

Speech And ASR Evaluations

Index repository for speech recognition and ASR evaluations

GitHub

Whisper Fine-Tune Accuracy Eval

Comparing Whisper fine-tunes versus stock Whisper on local inference

GitHub

Whisper Fine-Tune Eval

Evaluation interface for fine-tuned Whisper models

Hugging Face

Whisper WPM Background Noise Eval

Quick eval: how much does speaking pace affect WER/accuracy in ASR?

GitHub


Audio Processing

Microphone setup, EQ, and audio chain tools for optimal STT input.

Deepnet Baby Noise Scrub

Audio cleaning tool for removing baby/background noise from recordings

Hugging Face

EQ Template Generator

Generate EQ templates for audio processing

Hugging Face

Mic Input Boot FX Script Ubuntu

Boot script to ensure that Easy Effects manages the input sound source on boot (Ubuntu)

GitHub

Speech Recognition Audio Chain

Attempt to set up a good autostart audio processing chain for STT

GitHub

Voice Analyzer

Analyses voice data

GitHub


TTS & Speech Synthesis

Text-to-speech and SSML generation tools.

Text To SSML Generator

Generates SSML from text by inference

GitHub

Hebrew TTS Providers

Reference of Hebrew text-to-speech providers and services

GitHub


General

Documentation, research, curated lists, and miscellaneous speech tech resources.

ASR And STT AI Notebook

Prompts and outputs (and some notes) on STT + ASR + fine-tuning. LLM: Claude

GitHub

Awesome Whisper Apps

Useful speech to text tools that use Whisper under the hood (API/local)

GitHub

Deepgram Text Input

Analysis of Deepgram text input

GitHub

Linux Voice Typing App Notes

Planning notes for a tool I've been working on for a while!

GitHub

STT Price Points 260225

Some timestamped API pricepoints for speech to text providers

GitHub

Voice LLM App Notes

A few notes describing the kind of voice app for large language models I would love to have!

GitHub

Dictation Macropad

Plan/key allocation for a macropad optimised for heavy daily dictation workflows

GitHub

Linux Friendly Voice Tech

List of resources for voice technology with support for Linux

GitHub

Speech To Text Chain Notes

Notes on STT processing chain (for future voice projects)

GitHub

Ubuntu Mic Selector

Utility for switching microphone sources

GitHub

Voice Control Linux

Claude-enhanced research for voice control platforms with Linux support

GitHub

Voicepad

Planning notes for a macropad for STT users

GitHub

About

Index of my repos related to STT, ASR

Topics

Resources

Stars

Watchers

Forks

Contributors