Skip to content

Commit 79ce9d1

Browse files
committed
feat: Initial release of v0.0.4-alpha
Initial clean release of the Vosk command-line dictation tool. This version is prepared for public release and includes the required directory structure.
0 parents  commit 79ce9d1

29 files changed

+2439
-0
lines changed

.gitignore

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# Fichiers compilés
2+
*.o
3+
*.out
4+
5+
# Dossiers
6+
node_modules/
7+
dist/
8+
page-html/*
9+
10+
11+
# Python
12+
__pycache__/
13+
*.pyc
14+
15+
vosk-env/
16+
*.pyo
17+
*.pyd
18+
.Python
19+
env/
20+
venv/
21+
22+
# Fichiers de configuration locaux
23+
config.local.json
24+
25+
# Fichiers temporaires
26+
*.tmp
27+
*.log
28+
GEMINI.md
29+
# Environnements virtuels Python
30+
vosk-env/
31+
venv/
32+
33+
# Fichiers de cache et temporaires Python
34+
__pycache__/
35+
*.py[co]
36+
*.pyd
37+
38+
# Fichiers IDE et système
39+
.idea/
40+
.vscode/
41+
.DS_Store
42+
Thumbs.db
43+
44+
# Ignorer TOUT le contenu du dossier des modèles...
45+
vosk-model/*
46+
47+
# ...MAIS faire une exception pour conserver le fichier .gitkeep.
48+
!vosk-model/.gitkeep

CONTRIBUTING.md

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
# Contribution Guide
2+
3+
A huge thank you for your interest in this project! Every contribution, whether it's an idea, a bug report, or a few lines of code, is valuable and welcome.
4+
5+
## Project Philosophy: A Modern Partnership
6+
7+
This project was born from a unique partnership: it was **architected and driven by a human**, but its execution was greatly accelerated by the **power of several Large Language Models (LLMs)** like Claude, Gemini, and others.
8+
9+
We see this not as mere assistance, but as a modern development model where human vision guides the efficiency of AI. We encourage contributions that share this spirit of innovation, experimentation, and effectiveness.
10+
11+
## How You Can Help
12+
13+
All help is welcome. Here are the best ways to get started:
14+
15+
### 🐛 Found a Bug?
16+
17+
1. **Check First!** Take a look at the [Issues](https://github.com/RonanDavalan/projet_vosk/issues) to make sure the bug hasn't already been reported.
18+
2. **Open a New Issue.** If the bug is new, please report it with as much detail as possible:
19+
* Your Linux distribution and version.
20+
* The exact steps to reproduce the problem.
21+
* The full error message, copy-pasted into the issue.
22+
23+
### ✨ Have an Idea for an Improvement?
24+
25+
All suggestions to improve the tool are welcome!
26+
27+
1. **Open a [new issue](https://github.com/RonanDavalan/projet_vosk/issues) to start the discussion.**
28+
2. Describe your idea and explain how it would benefit users.
29+
30+
### 💻 Want to Contribute Code?
31+
32+
Fantastic! Here's how to proceed to ensure your contribution is smoothly integrated.
33+
34+
**1. Prepare Your Environment**
35+
36+
* **Fork** the repository (create a copy on your own GitHub account).
37+
* **Clone** your fork to your machine: `git clone https://github.com/YOUR_USERNAME/YOUR_PROJECT.git`
38+
* **Create a new branch** for your changes. This is essential to avoid working directly on `main`.
39+
```bash
40+
git checkout -b feature/your-feature-name
41+
```
42+
43+
**2. Code!**
44+
45+
* Make your changes. Try to keep the code clear, simple, and commented where necessary.
46+
* Be sure to test your changes to verify that they work and don't create new bugs.
47+
48+
**3. Submit Your Contribution**
49+
50+
* **Commit your changes** with a clear message, **in English**. Using English and following a convention is a best practice for maintaining a clean history that is understandable by everyone. We use the "Conventional Commits" standard.
51+
52+
* For a new feature: `git commit -m "feat: add voice command for XYZ"`
53+
* For a bug fix: `git commit -m "fix: resolve issue with microphone detection"`
54+
* For documentation: `git commit -m "docs: update installation guide"`
55+
56+
* **Push your branch** to your fork on GitHub.
57+
* **Open a Pull Request (PR)** from your fork to the main project's `main` branch. In the PR description, explain what you did and why.
58+
59+
## Code Style
60+
61+
The project follows the **PEP 8** style conventions for Python. Clean and readable code is as important as code that works!
62+
63+
---
64+
Thank you again for your time and help. It's thanks to people like you that open-source projects can grow and improve.

CONTRIBUTORS.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Acknowledgments and Contributors
2+
3+
This project could not have come to life without the hard work of many people and the existence of fantastic open-source tools. I want to express my deep gratitude to everyone who contributed, directly or indirectly, to its creation.
4+
5+
## Core Projects and Libraries
6+
7+
This software relies entirely on the following open-source projects. A big thank you to their developers and communities for providing such powerful and accessible tools.
8+
9+
* **[Vosk-API](https://alphacephei.com/vosk/)**: For providing the robust and high-performance offline speech recognition engine that is the heart of this application.
10+
* **[Python](https://www.python.org/)**: For a versatile programming language and an incredible community.
11+
* **[xdotool](https://github.com/jordansissel/xdotool)**: For the essential tool that allows simulating keyboard input and controlling windows under Linux.
12+
* **[pactl (PulseAudio)](https://www.freedesktop.org/wiki/Software/PulseAudio/)**: For the commands to manage the system's audio sources.
13+
* **[PyYAML](https://pyyaml.org/)**: For handling YAML configuration files.
14+
* **[Pynput](https://pynput.readthedocs.io/en/latest/)**: For managing keyboard and mouse events.
15+
* **[PyAudio](https://pypi.org/project/PyAudio/)**: For audio capture.
16+
17+
## Design and Development Support
18+
19+
I also want to thank the entities and individuals who helped me structure my ideas, solve complex problems, and improve the quality of the project.
20+
21+
* **[Gemini & ChatGPT](https://gemini.google.com/)**: For their assistance as tools for brainstorming, code generation, and debugging. Their ability to provide quick answers and different perspectives has been a valuable accelerator throughout development.
22+
23+
## Code and Documentation Contributors
24+
25+
This section is dedicated to all the people who have contributed directly to the project by submitting relevant pull requests or issues.
26+
27+
*(This section is currently empty, but I hope to see your name here soon!)*
28+
29+
---
30+
*If you wish to contribute, please see our [contribution guide](./CONTRIBUTING.md).*

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2024 Ronan Davalan
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
# Vosk-CLI-Dictation
2+
3+
<p align="center">
4+
<a href="https://github.com/RonanDavalan/vosk-cli-dictation/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-MIT-blue.svg" alt="License: MIT"></a>
5+
<a href="#"><img src="https://img.shields.io/badge/Version-v0.0.4--alpha-yellow.svg" alt="Version: v0.0.4-alpha"></a>
6+
<a href="#"><img src="https://img.shields.io/badge/python-3.11+-blue.svg" alt="Python 3.11+"></a>
7+
<a href="#"><img src="https://img.shields.io/badge/OS-Debian_12-D70A53.svg" alt="Tested on Debian 12"></a>
8+
<a href="#"><img src="https://img.shields.io/badge/Contributions-Welcome-brightgreen.svg" alt="Contributions Welcome"></a>
9+
</p>
10+
11+
<p align="center">
12+
<a href="https://vosk.davalan.fr/"><strong>➡️ Visit the Official Homepage for a full tour!</strong></a>
13+
</p>
14+
15+
<p align="center">
16+
<img src="assets/images/vosk-cli-dictation_live-demo.gif" alt="Live Demonstration">
17+
</p>
18+
19+
A powerful and customizable command-line dictation tool for Linux, powered by the Vosk engine. Turn your speech into text directly in your terminal and have it appear in any application.
20+
21+
---
22+
23+
## ✨ Why This Project?
24+
25+
This tool was born from the need for a robust, locally-run dictation system on GNU/Linux that gives the user full control over their data and the software's behavior. Unlike cloud-based solutions, it operates **100% offline**, guaranteeing your privacy.
26+
27+
## 🚀 Key Features
28+
29+
* **System-Wide Integration:** Dictate into any active application (terminal, browser, code editor, etc.).
30+
* **Multi-Language Support:** Works out-of-the-box with **English** and **French**.
31+
* **Total Control:** Use global hotkeys, voice commands, and manual controls to manage your dictation.
32+
* **Fully Customizable:** Configure UI colors, hotkeys, voice commands, and recognition aliases in a simple `config.yaml` file.
33+
* **Full Punctuation:** Dictate a complete range of punctuation with natural voice commands.
34+
* **Private and Offline:** Runs entirely on your machine. Your voice data never leaves your computer.
35+
36+
> [!WARNING]
37+
>
38+
> **Compatibility Note: X11 vs. Wayland** This application uses `xdotool` for keyboard simulation, which is primarily designed for the **X11 display server**. While it may **not work natively** on systems running **Wayland** by default (such as recent versions of Fedora or Ubuntu), some compatibility can be achieved through **XWayland**, which provides X11 compatibility layer on Wayland systems. For full compatibility, ensure you have an X11 session, though basic functionality might work under Wayland with XWayland.
39+
40+
## ⚙️ Getting Started
41+
42+
### 1. System Dependencies
43+
44+
This project was developed and tested on **Debian 12 (Bookworm)**. You will need the following packages:
45+
46+
```bash
47+
sudo apt-get update && sudo apt-get install python3-pip python3-venv portaudio19-dev gettext xdotool pulseaudio-utils
48+
```
49+
50+
### 2. Project Setup
51+
52+
Clone the repository and create the Python virtual environment.
53+
54+
```bash
55+
git clone https://github.com/RonanDavalan/vosk-cli-dictation.git
56+
cd vosk-cli-dictation
57+
python3 -m venv venv
58+
source venv/bin/activate
59+
```
60+
61+
### 3. Install Python Dependencies
62+
63+
Install the required Python packages from the `requirements.txt` file.
64+
65+
```bash
66+
pip install -r requirements.txt
67+
```
68+
69+
### 4. Download Language Models
70+
71+
This project requires Vosk language models to function. They are not included in the repository and must be downloaded manually.
72+
73+
1. **Download the models you need:** We recommend starting with the small models for efficiency.
74+
* **French Model:** [vosk-model-small-fr-0.22](https://alphacephei.com/vosk/models/vosk-model-small-fr-0.22.zip)
75+
* **English Model:** [vosk-model-small-en-us-0.15](https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip)
76+
77+
2. **Unzip and place them correctly:** After unzipping, the model folders (e.g., `vosk-model-small-fr-0.22`) must be placed directly inside the `vosk-model/` directory of the project.
78+
79+
### 5. Launch The Application
80+
81+
With the virtual environment still active, run the main script.
82+
83+
```bash
84+
# To run with the French model (default in config)
85+
python3 src/main.py
86+
87+
# To explicitly run with the English model
88+
python3 src/main.py -l en
89+
```
90+
91+
### 💡 Pro Tip: Create a Quick-Launch Command (Optional)
92+
93+
For easier access, you can add a function to your shell's configuration file (e.g., `~/.bashrc` or `~/.zshrc`).
94+
95+
1. Open your configuration file: `nano ~/.bashrc`
96+
2. Add the following lines at the end. **Remember to replace `/path/to/your/vosk-cli-dictation` with the actual, absolute path to the project directory.**
97+
98+
```bash
99+
# Defines a 'vosk' command to launch the dictation script easily.
100+
vosk() {
101+
# Check if the project directory exists
102+
if [ -d "/path/to/your/vosk-cli-dictation" ]; then
103+
cd "/path/to/your/vosk-cli-dictation" && \
104+
source venv/bin/activate && \
105+
python3 src/main.py "$@" && \
106+
deactivate
107+
else
108+
echo "Error: Project directory not found at /path/to/your/vosk-cli-dictation"
109+
fi
110+
}
111+
```
112+
113+
3. Apply the changes by running `source ~/.bashrc` or by opening a new terminal.
114+
4. You can now launch the application from anywhere by simply typing `vosk` or `vosk -l en`.
115+
116+
## 🛠️ Usage & Configuration
117+
118+
### Language Selection
119+
120+
The language model is chosen based on the following priority:
121+
122+
1. **Command-Line Flag (Highest Priority):** Use `-l en` or `-l fr` to force a specific language for the session.
123+
2. **Default in `config.yaml` (Fallback):** If no flag is used, the `default_model` from `config/config.yaml` is loaded.
124+
125+
### Manual Commands
126+
127+
While the script is running, you can type these commands in its terminal window and press Enter:
128+
129+
* `/cancel`: Stops the current recording session without outputting text.
130+
* `/delete-word`: Deletes the last typed word.
131+
* `/nl`: Inserts a new line (simulates pressing the Enter key).
132+
133+
### Customization
134+
135+
All settings can be modified in the `config/config.yaml` file. This includes hotkeys, UI colors, voice commands, and recognition aliases.
136+
137+
## 🤝 Contributing
138+
139+
This project is open source and contributions are highly encouraged! Please check out our **[Contribution Guide](CONTRIBUTING.md)** and feel free to open an issue on the [issues page](https://github.com/RonanDavalan/vosk-cli-dictation/issues).
140+
141+
## 📄 License
142+
143+
This project is licensed under the **MIT License**. See the [LICENSE](LICENSE) file for details.
1.44 MB
Loading

babel.cfg

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# babel.cfg
2+
#
3+
# This configuration file tells Babel which files to scan for translatable strings.
4+
# We are telling it to look for the gettext function call `_()` in all Python
5+
# files located inside the 'src' directory.
6+
7+
[python: src/**.py]

0 commit comments

Comments
 (0)