Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .ci/spellcheck/.pyspelling.wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ arxiv
ASPP
ASR
asr
ASRModel
ASYM
async
AsyncInferQueue
Expand Down Expand Up @@ -158,6 +159,7 @@ coors
coreference
CoSENT
CosyVoice
Conv
cpm
cpp
cpu
Expand Down Expand Up @@ -316,9 +318,11 @@ finetuned
finetuning
FireRedTTS
FLAC
flac
FLD
floyd
fn
ForcedAligner
foley
Formatter
formatter
Expand Down Expand Up @@ -1162,6 +1166,7 @@ Vladlen
vlm
VLM
VLModel
vLLM
VLMPipeline
VLMs
VL’s
Expand Down
40 changes: 40 additions & 0 deletions notebooks/qwen3-asr/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Qwen3-ASR Speech Recognition with OpenVINO™

The Qwen3-ASR family includes Qwen3-ASR-1.7B and Qwen3-ASR-0.6B, which support language identification and ASR for 52 languages and dialects. Both leverage large-scale speech training data and the strong audio understanding capability of their foundation model, Qwen3-Omni. Experiments show that the 1.7B version achieves state-of-the-art performance among open-source ASR models and is competitive with the strongest proprietary commercial APIs. Here are the main features:

* **All-in-one**: Qwen3-ASR-1.7B and Qwen3-ASR-0.6B support language identification and speech recognition for 30 languages and 22 Chinese dialects, so as to English accents from multiple countries and regions.

* **Excellent and Fast**: The Qwen3-ASR family ASR models maintains high-quality and robust recognition under complex acoustic environments and challenging text patterns. Qwen3-ASR-1.7B achieves strong performance on both open-sourced and internal benchmarks. While the 0.6B version achieves accuracy-efficient trade-off, it reaches 2000 times throughput at a concurrency of 128. They both achieve streaming / offline unified inference with single model and support transcribe long audio.

* **Novel and strong forced alignment Solution**: We introduce Qwen3-ForcedAligner-0.6B, which supports timestamp prediction for arbitrary units within up to 5 minutes of speech in 11 languages. Evaluations show its timestamp accuracy surpasses E2E based forced-alignment models.

* **Comprehensive inference toolkit**: In addition to open-sourcing the architectures and weights of the Qwen3-ASR series, we also release a powerful, full-featured inference framework that supports vLLM-based batch inference, asynchronous serving, streaming inference, timestamp prediction, and more.

<p align="center">
<img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/overview.jpg" width="100%"/>
<p>

More details can be found in the original [repository](https://github.com/QwenLM/Qwen3-ASR) and [model card](https://huggingface.co/Qwen/Qwen3-ASR-0.6B)

### Notebook Contents

In this tutorial we consider how to run and optimize Qwen3-ASR using OpenVINO.

The tutorial consists of the following steps:

- Install prerequisites
- Convert model to OpenVINO intermediate representation (IR) format
- Prepare OpenVINO Inference pipeline
- Run Speech Recognition
- Launch interactive demo

## Installation Instructions

This is a self-contained example that relies solely on its own code.</br>
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.
For further details, please refer to [Installation Guide](../../README.md).

⚠️ **EXPERIMENTAL NOTEBOOK**

This notebook demonstrates a model that has not been fully validated with OpenVINO. It may be fully supported and validated in the future.
<img referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=5b5a4db0-7875-4bfb-bdbd-01698b5b1a77&file=notebooks/qwen3-asr/README.md" />
Loading
Loading