Skip to content

Commit 218d688

Browse files
committed
docs(voxtral_realtime): document CUDA Windows workflow
Add CUDA-Windows instructions to the Voxtral Realtime README, including export prerequisites and an example command. Document Windows build steps via CMake workflow presets and add PowerShell run examples with and without the .ptd data file. Note recommended CUDA architectures for int4 kernels, and reformat voxtral_realtime CMake presets without changing behavior.
1 parent 89b938a commit 218d688

File tree

2 files changed

+98
-25
lines changed

2 files changed

+98
-25
lines changed

examples/models/voxtral_realtime/CMakePresets.json

Lines changed: 23 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -14,12 +14,16 @@
1414
{
1515
"name": "voxtral-realtime-cpu",
1616
"displayName": "Voxtral Realtime runner (CPU)",
17-
"inherits": ["voxtral-realtime-base"]
17+
"inherits": [
18+
"voxtral-realtime-base"
19+
]
1820
},
1921
{
2022
"name": "voxtral-realtime-metal",
2123
"displayName": "Voxtral Realtime runner (Metal)",
22-
"inherits": ["voxtral-realtime-base"],
24+
"inherits": [
25+
"voxtral-realtime-base"
26+
],
2327
"cacheVariables": {
2428
"EXECUTORCH_BUILD_METAL": "ON"
2529
},
@@ -32,14 +36,19 @@
3236
{
3337
"name": "voxtral-realtime-cuda",
3438
"displayName": "Voxtral Realtime runner (CUDA)",
35-
"inherits": ["voxtral-realtime-base"],
39+
"inherits": [
40+
"voxtral-realtime-base"
41+
],
3642
"cacheVariables": {
3743
"EXECUTORCH_BUILD_CUDA": "ON"
3844
},
3945
"condition": {
4046
"type": "inList",
4147
"string": "${hostSystemName}",
42-
"list": ["Linux", "Windows"]
48+
"list": [
49+
"Linux",
50+
"Windows"
51+
]
4352
}
4453
}
4554
],
@@ -48,20 +57,26 @@
4857
"name": "voxtral-realtime-cpu",
4958
"displayName": "Build Voxtral Realtime runner (CPU)",
5059
"configurePreset": "voxtral-realtime-cpu",
51-
"targets": ["voxtral_realtime_runner"]
60+
"targets": [
61+
"voxtral_realtime_runner"
62+
]
5263
},
5364
{
5465
"name": "voxtral-realtime-metal",
5566
"displayName": "Build Voxtral Realtime runner (Metal)",
5667
"configurePreset": "voxtral-realtime-metal",
5768
"configuration": "Release",
58-
"targets": ["voxtral_realtime_runner"]
69+
"targets": [
70+
"voxtral_realtime_runner"
71+
]
5972
},
6073
{
6174
"name": "voxtral-realtime-cuda",
6275
"displayName": "Build Voxtral Realtime runner (CUDA)",
6376
"configurePreset": "voxtral-realtime-cuda",
64-
"targets": ["voxtral_realtime_runner"]
77+
"targets": [
78+
"voxtral_realtime_runner"
79+
]
6580
}
6681
],
6782
"workflowPresets": [
@@ -108,4 +123,4 @@
108123
]
109124
}
110125
]
111-
}
126+
}

examples/models/voxtral_realtime/README.md

Lines changed: 75 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,7 @@ python export_voxtral_rt.py \
9797
| `xnnpack` ||| `4w`, `8w`, `8da4w`, `8da8w` |
9898
| `metal` ||| none (fp32) or `fpa4w` (Metal-specific 4-bit) |
9999
| `cuda` ||| `4w`, `8w` |
100+
| `cuda-windows` ||| `4w`, `8w` |
100101

101102
Metal backend provides Apple GPU acceleration. CUDA backend provides NVIDIA GPU
102103
acceleration via AOTInductor.
@@ -171,6 +172,38 @@ Alternatively, you can build torchao with Metal support while installing ExecuTo
171172
EXECUTORCH_BUILD_KERNELS_TORCHAO=1 TORCHAO_BUILD_EXPERIMENTAL_MPS=1 ./install_executorch.sh
172173
```
173174

175+
### CUDA-Windows Export
176+
177+
Before running `cuda-windows` export, make sure these requirements are set up:
178+
- `x86_64-w64-mingw32-g++` is installed and on `PATH` (mingw-w64 cross-compiler).
179+
- `WINDOWS_CUDA_HOME` points to the extracted Windows CUDA package directory.
180+
181+
Example setup on Ubuntu (refer to [Parakeet README](../../parakeet/README.md) for detailed extraction steps):
182+
183+
```bash
184+
# Ensure the WINDOWS_CUDA_HOME environment variable is set
185+
export WINDOWS_CUDA_HOME=/opt/cuda-windows/extracted/cuda_cudart/cudart
186+
```
187+
188+
Export the model for Windows CUDA (example with int4 quantization):
189+
190+
```bash
191+
python export_voxtral_rt.py \
192+
--model-path ~/models/Voxtral-Mini-4B-Realtime-2602 \
193+
--backend cuda-windows \
194+
--dtype bf16 \
195+
--output-dir ./voxtral_rt_exports \
196+
--qlinear-encoder 4w \
197+
--qlinear-encoder-packing-format tile_packed_to_4d \
198+
--qlinear 4w \
199+
--qlinear-packing-format tile_packed_to_4d \
200+
--qembedding 8w
201+
```
202+
203+
This generates:
204+
- `model.pte`
205+
- `aoti_cuda_blob.ptd`
206+
174207
### Options
175208

176209
| Flag | Default | Description |
@@ -228,6 +261,18 @@ make voxtral_realtime-metal
228261
This builds ExecuTorch with Metal backend support. The runner binary is at
229262
the same path as above. Metal exports can only run on macOS with Apple Silicon.
230263

264+
### CUDA-Windows
265+
266+
On Windows (PowerShell), use CMake workflow presets directly from the executorch root directory. Note that if you exported the model with 4-bit quantization, you may need to specify your GPU's compute capability (e.g., `80;86;89;90;120` for Ampere, Lovelace, Hopper, and Blackwell) to avoid "invalid device function" errors at runtime, as the `int4mm` kernels require SM 80 or newer.
267+
268+
```powershell
269+
$env:CMAKE_CUDA_ARCHITECTURES="80;86;89;90;120"
270+
cmake --workflow --preset llm-release-cuda
271+
Push-Location examples/models/voxtral_realtime
272+
cmake --workflow --preset voxtral-realtime-cuda
273+
Pop-Location
274+
```
275+
231276
## Run
232277

233278
The runner requires:
@@ -237,35 +282,37 @@ The runner requires:
237282
- A 16kHz mono WAV audio file (or live audio via `--mic`)
238283
- For CUDA: `aoti_cuda_blob.ptd` — delegate data file (pass via `--data_path`)
239284

240-
```bash
241-
cmake-out/examples/models/voxtral_realtime/voxtral_realtime_runner \
242-
--model_path voxtral_rt_exports/model.pte \
243-
--tokenizer_path ~/models/Voxtral-Mini-4B-Realtime-2602/tekken.json \
244-
--preprocessor_path voxtral_rt_exports/preprocessor.pte \
285+
### Windows (PowerShell)
286+
287+
```powershell
288+
.\cmake-out\examples\models\voxtral_realtime\Release\voxtral_realtime_runner.exe `
289+
--model_path voxtral_rt_exports\model.pte `
290+
--tokenizer_path C:\path\to\tekken.json `
291+
--preprocessor_path voxtral_rt_exports\preprocessor.pte `
245292
--audio_path input.wav
246293
```
247294

248295
For CUDA, include the `.ptd` data file:
249296

250-
```bash
251-
cmake-out/examples/models/voxtral_realtime/voxtral_realtime_runner \
252-
--model_path voxtral_rt_exports/model.pte \
253-
--data_path voxtral_rt_exports/aoti_cuda_blob.ptd \
254-
--tokenizer_path ~/models/Voxtral-Mini-4B-Realtime-2602/tekken.json \
255-
--preprocessor_path voxtral_rt_exports/preprocessor.pte \
297+
```powershell
298+
.\cmake-out\examples\models\voxtral_realtime\Release\voxtral_realtime_runner.exe `
299+
--model_path voxtral_rt_exports\model.pte `
300+
--data_path voxtral_rt_exports\aoti_cuda_blob.ptd `
301+
--tokenizer_path C:\path\to\tekken.json `
302+
--preprocessor_path voxtral_rt_exports\preprocessor.pte `
256303
--audio_path input.wav
257304
```
258305

259306
For streaming, add `--streaming`. This requires a model exported with
260307
`--streaming`. The runner processes audio in 80ms steps, computing mel
261308
and running the encoder+decoder incrementally.
262309

263-
```bash
264-
cmake-out/examples/models/voxtral_realtime/voxtral_realtime_runner \
265-
--model_path voxtral_rt_exports/model.pte \
266-
--tokenizer_path ~/models/Voxtral-Mini-4B-Realtime-2602/tekken.json \
267-
--preprocessor_path voxtral_rt_exports/preprocessor.pte \
268-
--audio_path input.wav \
310+
```powershell
311+
.\cmake-out\examples\models\voxtral_realtime\Release\voxtral_realtime_runner.exe `
312+
--model_path voxtral_rt_exports\model.pte `
313+
--tokenizer_path C:\path\to\tekken.json `
314+
--preprocessor_path voxtral_rt_exports\preprocessor.pte `
315+
--audio_path input.wav `
269316
--streaming
270317
```
271318

@@ -285,6 +332,17 @@ ffmpeg -f avfoundation -i ":0" -ar 16000 -ac 1 -f f32le -nostats -loglevel error
285332

286333
Ctrl+C stops recording and flushes remaining text.
287334

335+
**Windows (PowerShell):**
336+
337+
```powershell
338+
.\cmake-out\examples\models\voxtral_realtime\Release\voxtral_realtime_runner.exe `
339+
--model_path C:\path\to\voxtral_rt_exports\model.pte `
340+
--data_path C:\path\to\voxtral_rt_exports\aoti_cuda_blob.ptd `
341+
--tokenizer_path C:\path\to\tekken.json `
342+
--preprocessor_path C:\path\to\voxtral_rt_exports\preprocessor.pte `
343+
--audio_path C:\path\to\input.wav
344+
```
345+
288346
**CUDA:** Add `--data_path voxtral_rt_exports/aoti_cuda_blob.ptd` to all
289347
run commands above when using the CUDA backend.
290348

0 commit comments

Comments
 (0)