yo is:
- 50% Nix: compile-time grammar compiler
- 50% Rust: run-time deterministic interpreter with some fuzziness on top
It takes declarative sentence templates with optional parameters and entity lists, expands them into all possible variants, generates optimized regular expressions.
At runtime it takes input, runs it through exact and fuzzy matching against the pre‑compiled patterns, extracts any parameter, and executes the corresponding script with those arguments – effectively translating plain‑language commands into system shell actions.
yo supports usage from:
- NixOS module
- Full Rust version (scripts in Toml) for non Nix users
- Client support for any Linux/ESP32 that has i2s configured
yo is a full-stack voice assistant that's:
- Very Fast - Pre-compiled indexing, smartt priority ordering & Rust high performance makes it super fast.
- Lightweight - Very few dependencies.
- Simple - Everything neatly packaged and runs on one port.
- Safe - Rule based, user defines the rules.
- Offline - No internet required after setup.
- Easy to deploy - Using the NixOS module or containerized clients via Docker.
- Plug & Play - Using the
examples/scripts.
yo is NOT:
- ❌ An LLM with shell access!
❄️ Using flakes (recommended)
Use yo as voice assistant in 4 steps:
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable";
yo.url = "github:quackhack-mcblindy/yo";
}; imports = [ yo.nixosModules.yo ];services.yo-rs = {
# Handles Wake-word detection/Transcription/Shell execution/Text-to-speech generation
server = {
enable = true;
shellTranslate = true;
demo = true; # imports /examples/*.nix
# Optional settings:
# host = "0.0.0.0:12345";
# wakeWordPath = "/path/to/custom/model.onnx";
# threshold = 0.8;
# awakeSound = "/path/to/custom/awake.wav";
# doneSound = "/path/to/custom/done.wav";
# failSound = "/path/to/custom/fail.wav";
# whisperModelPath = "/path/to/custom/ggml-model.bin";
# textToSpeechModelPath = "/path/to/custom/tts/model.onnx";
# language = "sv";
# beamSize = 5;
# temperature = 0.2;
# threads = 4;
# debug = true;
# logFile = "/path/to/custom/log/path/yo-rs-server.log";
# You can use Home Assistant's intent handler instead:
# execCommand = ''
# curl -X POST "http://HOME_ASSISTANT_IP:8123/api/conversation/process" \
# -H "Authorization: Bearer YOUR_LONG_LIVED_ACCESS_TOKEN" \
# -H "Content-Type: application/json" \
# -d "{\"text\":\"$1\",\"language\":\"sv\"}"
# '';
};
# Microphone client (streams audio - RMS based VAD)
client = {
enable = true; # starts the microphone client
# Optional settings:
# uri = "192.168.1.111:12345";
# awakeSound = "/path/to/custom/awake.wav";
# doneSound = "/path/to/custom/done.wav";
# failSound = "/path/to/custom/fail.wav";
# awakeCmd = "notify-send 'Wake word detected'";
# doneCmd = "mpg123 /path/to/success.mp3";
# silenceThreshold = 0.03;
# silenceTimeout = 1.5;
# debug = true;
# logFile = "/path/to/custom/log/path/yo-rs-client.log";
};
};$ sudo nixos-rebuild switch --flake /path/to/flake ...Done!
Now you can speak your wake word (default: "yo bitch")
& ask what time it is.
or if you prefer CLI:
❄️ DOTFILES on main [$!+]
✦ 07:17:33 ❯ yo do "what time is it"
┌─(yo-time)
│🦆 qwack!? what time is it
└─🦆 says ⮞ no parameters yo
└─⏰ do took 183.835µs
07:17Approx: ~0.184 ms
But if you don't like Rust, or have a basic setup you can use Bash instead by setting:
yo.legacy = true;Which only relies on pkgs.jq and pkgs.coreutils.
📦 Building from source
If your not on a NixOS system you can choose to compile the grammar using yo-toml (Rust) instead of Nix.
This involves creating your voice sentences and commands using .toml files and using yo-toml --config-dir to generated the required JSOn files.
Example
$ git clone git@github.com:QuackHack-McBlindy/yo.git
$ cd yo
$ cargo build --release --manifest-path ./packages/yo-rs/Cargo.toml
# Specify directory containing your toml scripts
$ ./packages/yo-rs/target/release/yo-toml --config-dir ./examples --output $XDG_CACHE_HOME/yo
# Build the wrappers from toml scripts
$ ./packages/yo-rs/target/release/yo-builder ./examples $XDG_CACHE_HOME/yo/binNow you should be able to start the server/client.
Server:
$ ./target/release/yo-rs \
--host 0.0.0.0:12345 \
--translate-to-shell \
# Optional:
# --wake-word ./models/wake.onnx \
# --threshold 0.5 \
# --model ./models/ggml-small.bin \
# --beam-size 5 \
# --temperature 0.2 \
# --threads 4 \
# --language en \
# --awake-sound ./sounds/ding.wav \
# --done-sound ./sounds/done.wav \
# --exec-command "echo" \
# --tts-model ./models/en_US-amy-medium.onnx \
# --debugClient:
./target/release/yo-client \
--uri 127.0.0.1:12345 \
# Optional:
# --room desktop \
# --awake-sound ./sounds/ding.wav \
# --done-sound ./sounds/done.wav \
# --awake-cmd "notify-send Listening" \
# --done-cmd "notify-send Done" \
# --silence-threshold 0.005 \
# --silence-timeout 1.0 \
# --max-duration 5.0 \
# --debugDone!
If both started without issues - you can now:
speak your wake word (default: "yo bitch")
& ask what time it is or what weather it is or whatever.
or if you prefer CLI:
$ yo do "whats the time"
# legacy: (slower - but cooler)
$ yo legacy "is it warm outside"🐋 Docker (for use outside of Nix ecosystem)
Use the provided Dockerfile to build your container with either client, server or both.
Optional configuration can be made in the docker-compose.yaml file then run:
$ docker compose build client # or serverTo build the image.
To start client + server run:
$ docker compose --profile all upTo only start a clien:
$ docker compose --profile client upyo uses ONNX Runtime for text-to-speech inference and wake-word detection.
GGML-based bin models from the Whisper family is used for speech-to-text.
Run the following command to download a tiny GGML model and amy an en_US TTS model:
mkdir -p "$HOME/models/stt" && mkdir -p "$HOME/models/tts"
curl -L -o "$HOME/models/stt/ggml-tiny.bin" \
"https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.bin"
curl -L -o "$HOME/models/tts/en_US-amy-medium.onnx" \
"https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/amy/medium/en_US-amy-medium.onnx"
curl -L -o "$HOME/models/tts/en_US-amy-medium.onnx.json" \
"https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/amy/medium/en_US-amy-medium.onnx.json"
Learn how to write your own voice commands in the examples/
For inspiration, view my /bin - which has voice scripts that ranges between easy to advanced usage.
Read about the fuzzy matching logic in the docs/
Read more about the feature set in the docs/
🦆🧑🦯 says ⮞ Hi! I'm QuackHack-McBlindy!
Like my work?
Buy me a coffee, or become a sponsor.
Thanks for supporting open source/hungry developers♥️ 🦆!
MIT
Contributions are welcomed.
