|
| 1 | +# Vosk-CLI-Dictation |
| 2 | + |
| 3 | +<p align="center"> |
| 4 | + <a href="https://github.com/RonanDavalan/vosk-cli-dictation/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-MIT-blue.svg" alt="License: MIT"></a> |
| 5 | + <a href="#"><img src="https://img.shields.io/badge/Version-v0.0.4--alpha-yellow.svg" alt="Version: v0.0.4-alpha"></a> |
| 6 | + <a href="#"><img src="https://img.shields.io/badge/python-3.11+-blue.svg" alt="Python 3.11+"></a> |
| 7 | + <a href="#"><img src="https://img.shields.io/badge/OS-Debian_12-D70A53.svg" alt="Tested on Debian 12"></a> |
| 8 | + <a href="#"><img src="https://img.shields.io/badge/Contributions-Welcome-brightgreen.svg" alt="Contributions Welcome"></a> |
| 9 | +</p> |
| 10 | + |
| 11 | +<p align="center"> |
| 12 | + <a href="https://vosk.davalan.fr/"><strong>➡️ Visit the Official Homepage for a full tour!</strong></a> |
| 13 | +</p> |
| 14 | + |
| 15 | +<p align="center"> |
| 16 | + <img src="assets/images/vosk-cli-dictation_live-demo.gif" alt="Live Demonstration"> |
| 17 | +</p> |
| 18 | + |
| 19 | +A powerful and customizable command-line dictation tool for Linux, powered by the Vosk engine. Turn your speech into text directly in your terminal and have it appear in any application. |
| 20 | + |
| 21 | +--- |
| 22 | + |
| 23 | +## ✨ Why This Project? |
| 24 | + |
| 25 | +This tool was born from the need for a robust, locally-run dictation system on GNU/Linux that gives the user full control over their data and the software's behavior. Unlike cloud-based solutions, it operates **100% offline**, guaranteeing your privacy. |
| 26 | + |
| 27 | +## 🚀 Key Features |
| 28 | + |
| 29 | +* **System-Wide Integration:** Dictate into any active application (terminal, browser, code editor, etc.). |
| 30 | +* **Multi-Language Support:** Works out-of-the-box with **English** and **French**. |
| 31 | +* **Total Control:** Use global hotkeys, voice commands, and manual controls to manage your dictation. |
| 32 | +* **Fully Customizable:** Configure UI colors, hotkeys, voice commands, and recognition aliases in a simple `config.yaml` file. |
| 33 | +* **Full Punctuation:** Dictate a complete range of punctuation with natural voice commands. |
| 34 | +* **Private and Offline:** Runs entirely on your machine. Your voice data never leaves your computer. |
| 35 | + |
| 36 | +> [!WARNING] |
| 37 | +> |
| 38 | +> **Compatibility Note: X11 vs. Wayland** This application uses `xdotool` for keyboard simulation, which is primarily designed for the **X11 display server**. While it may **not work natively** on systems running **Wayland** by default (such as recent versions of Fedora or Ubuntu), some compatibility can be achieved through **XWayland**, which provides X11 compatibility layer on Wayland systems. For full compatibility, ensure you have an X11 session, though basic functionality might work under Wayland with XWayland. |
| 39 | +
|
| 40 | +## ⚙️ Getting Started |
| 41 | + |
| 42 | +### 1. System Dependencies |
| 43 | + |
| 44 | +This project was developed and tested on **Debian 12 (Bookworm)**. You will need the following packages: |
| 45 | + |
| 46 | +```bash |
| 47 | +sudo apt-get update && sudo apt-get install python3-pip python3-venv portaudio19-dev gettext xdotool pulseaudio-utils |
| 48 | +``` |
| 49 | + |
| 50 | +### 2. Project Setup |
| 51 | + |
| 52 | +Clone the repository and create the Python virtual environment. |
| 53 | + |
| 54 | +```bash |
| 55 | +git clone https://github.com/RonanDavalan/vosk-cli-dictation.git |
| 56 | +cd vosk-cli-dictation |
| 57 | +python3 -m venv venv |
| 58 | +source venv/bin/activate |
| 59 | +``` |
| 60 | + |
| 61 | +### 3. Install Python Dependencies |
| 62 | + |
| 63 | +Install the required Python packages from the `requirements.txt` file. |
| 64 | + |
| 65 | +```bash |
| 66 | +pip install -r requirements.txt |
| 67 | +``` |
| 68 | + |
| 69 | +### 4. Download Language Models |
| 70 | + |
| 71 | +This project requires Vosk language models to function. They are not included in the repository and must be downloaded manually. |
| 72 | + |
| 73 | +1. **Download the models you need:** We recommend starting with the small models for efficiency. |
| 74 | + * **French Model:** [vosk-model-small-fr-0.22](https://alphacephei.com/vosk/models/vosk-model-small-fr-0.22.zip) |
| 75 | + * **English Model:** [vosk-model-small-en-us-0.15](https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip) |
| 76 | + |
| 77 | +2. **Unzip and place them correctly:** After unzipping, the model folders (e.g., `vosk-model-small-fr-0.22`) must be placed directly inside the `vosk-model/` directory of the project. |
| 78 | + |
| 79 | +### 5. Launch The Application |
| 80 | + |
| 81 | +With the virtual environment still active, run the main script. |
| 82 | + |
| 83 | +```bash |
| 84 | +# To run with the French model (default in config) |
| 85 | +python3 src/main.py |
| 86 | + |
| 87 | +# To explicitly run with the English model |
| 88 | +python3 src/main.py -l en |
| 89 | +``` |
| 90 | + |
| 91 | +### 💡 Pro Tip: Create a Quick-Launch Command (Optional) |
| 92 | + |
| 93 | +For easier access, you can add a function to your shell's configuration file (e.g., `~/.bashrc` or `~/.zshrc`). |
| 94 | + |
| 95 | +1. Open your configuration file: `nano ~/.bashrc` |
| 96 | +2. Add the following lines at the end. **Remember to replace `/path/to/your/vosk-cli-dictation` with the actual, absolute path to the project directory.** |
| 97 | + |
| 98 | + ```bash |
| 99 | + # Defines a 'vosk' command to launch the dictation script easily. |
| 100 | + vosk() { |
| 101 | + # Check if the project directory exists |
| 102 | + if [ -d "/path/to/your/vosk-cli-dictation" ]; then |
| 103 | + cd "/path/to/your/vosk-cli-dictation" && \ |
| 104 | + source venv/bin/activate && \ |
| 105 | + python3 src/main.py "$@" && \ |
| 106 | + deactivate |
| 107 | + else |
| 108 | + echo "Error: Project directory not found at /path/to/your/vosk-cli-dictation" |
| 109 | + fi |
| 110 | + } |
| 111 | + ``` |
| 112 | + |
| 113 | +3. Apply the changes by running `source ~/.bashrc` or by opening a new terminal. |
| 114 | +4. You can now launch the application from anywhere by simply typing `vosk` or `vosk -l en`. |
| 115 | + |
| 116 | +## 🛠️ Usage & Configuration |
| 117 | + |
| 118 | +### Language Selection |
| 119 | + |
| 120 | +The language model is chosen based on the following priority: |
| 121 | + |
| 122 | +1. **Command-Line Flag (Highest Priority):** Use `-l en` or `-l fr` to force a specific language for the session. |
| 123 | +2. **Default in `config.yaml` (Fallback):** If no flag is used, the `default_model` from `config/config.yaml` is loaded. |
| 124 | + |
| 125 | +### Manual Commands |
| 126 | + |
| 127 | +While the script is running, you can type these commands in its terminal window and press Enter: |
| 128 | + |
| 129 | +* `/cancel`: Stops the current recording session without outputting text. |
| 130 | +* `/delete-word`: Deletes the last typed word. |
| 131 | +* `/nl`: Inserts a new line (simulates pressing the Enter key). |
| 132 | + |
| 133 | +### Customization |
| 134 | + |
| 135 | +All settings can be modified in the `config/config.yaml` file. This includes hotkeys, UI colors, voice commands, and recognition aliases. |
| 136 | + |
| 137 | +## 🤝 Contributing |
| 138 | + |
| 139 | +This project is open source and contributions are highly encouraged! Please check out our **[Contribution Guide](CONTRIBUTING.md)** and feel free to open an issue on the [issues page](https://github.com/RonanDavalan/vosk-cli-dictation/issues). |
| 140 | + |
| 141 | +## 📄 License |
| 142 | + |
| 143 | +This project is licensed under the **MIT License**. See the [LICENSE](LICENSE) file for details. |
0 commit comments