Skip to content
Closed
107 changes: 107 additions & 0 deletions docs/en/Cookbook/voice_agent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# Voice Dialogue Agent

This project demonstrates how to use [LazyLLM](https://github.com/LazyAGI/LazyLLM) to build a voice assistant system that supports speech input and audio output. It captures voice input through a microphone, transcribes it into text, generates a response using a large language model, and speaks the result aloud.

!!! abstract "In this section, you will learn how to:"

- Use speech_recognition to capture and recognize voice input from a microphone.
- Use LazyLLM.OnlineChatModule to invoke a large language model for natural language responses.
- Use pyttsx3 to convert text to speech for spoken output.

## Project Dependencies

Ensure the following dependencies are installed:

```bash
pip install lazyllm pyttsx3 speechrecognition soundfile
```

```python
import speech_recognition as sr
import pyttsx3
import lazyllm
```

## Step-by-Step Breakdown

### Step 1: Initialize the LLM and Text-to-Speech Engine

```python
chat = lazyllm.OnlineChatModule()
engine = pyttsx3.init()
```

> When using online models, you need to configure an `API_KEY`.
> See the [LazyLLM official documentation (Supported Platforms section)](https://docs.lazyllm.ai/en/stable/#supported-platforms) for details.

**Function Description:**

- `chat`: Uses LazyLLM's online chat module (default sensenova API)
- Supports switching different LLM backends
- Automatically manages conversation context
- `engine`: Initializes local text-to-speech engine (pyttsx3)
- Cross-platform speech output
- Supports adjusting speech rate, volume and other parameters

### Step 2: Build Voice Assistant Main Logic

``` python
def listen(chat):
r = sr.Recognizer()
with sr.Microphone() as source:
print("Calibrating...")
r.adjust_for_ambient_noise(source, duration=5)
print("Okay, go!")
while 1:
text = ""
print("listening now...")
try:
audio = r.listen(source, timeout=5, phrase_time_limit=30)
print("Recognizing...")
text = r.recognize_whisper(
audio,
model="medium.en",
show_dict=True,
)["text"]
except Exception as e:
unrecognized_speech_text = (
f"Sorry, I didn't catch that. Exception was: {e}s"
)
text = unrecognized_speech_text
print(text)
response_text = chat(text)
print(response_text)
engine.say(response_text)
engine.runAndWait()
```

### Sample Output

**You say:**
"What is the capital of France?"

**Console output:**

```bash
Calibrating...
Okay, go!
listening now...
Recognizing...
You said: What is the capital of France?
The capital of France is Paris.
```

**System speech output:**
"The capital of France is Paris."

### Notes

This script requires a machine **with a working microphone**.

If running on a **remote server** or **virtual machine**, please ensure that:

1. The server or host machine has an available microphone;
2. The runtime environment has **microphone access permissions** enabled.

> Otherwise, the program may encounter initialization errors when calling `sr.Microphone()`.
> When **no sound input** is detected, the `whisper` model may mistakenly recognize silence as common phrases (such as “Thank you”, etc.).
1 change: 1 addition & 0 deletions docs/nav_en.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
- Great Writer: Cookbook/great_writer.md
- RAG: Cookbook/rag.md
- Streaming: Cookbook/streaming.md
- Voice_Assistant: Cookbook/voice_agent.md
- Best Practice:
- Flow: Best Practice/flow.md
- Flowapp: Best Practice/flowapp.md
Expand Down
1 change: 1 addition & 0 deletions docs/nav_zh.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
- 写作大师: Cookbook/great_writer.md
- 检索增强: Cookbook/rag.md
- 流式输出: Cookbook/streaming.md
- 语音对话: Cookbook/voice_agent.md
- 最佳实践:
- 工作流: Best Practice/flow.md
- 流程应用: Best Practice/flowapp.md
Expand Down
106 changes: 106 additions & 0 deletions docs/zh/Cookbook/voice_agent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# 语音对话 Agent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加一部分内容,展示一下如何使用lazyllm内的tts和stt来实现这里的功能。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

参考:


本项目展示了如何使用[LazyLLM](https://github.com/LazyAGI/LazyLLM),实现一个支持语音输入与语音播报的语音助手系统,可通过麦克风接收用户语音指令、识别语音文本、调用大模型生成回答,并通过语音播报返回。

!!! abstract "通过本节您将学习到以下内容"

- 如何使用 `speech_recognition` 接收并识别麦克风语音。
- 如何使用 `LazyLLM.OnlineChatModule` 调用大模型进行自然语言回答。
- 如何使用 `pyttsx3` 将文本转为语音实现播报。

## 项目依赖

确保安装以下依赖:

```bash
pip install lazyllm pyttsx3 speechrecognition soundfile
```

```python
import speech_recognition as sr
import pyttsx3
import lazyllm
```

## 步骤详解

### Step 1: 初始化大模型与语音播报引擎

```python
chat = lazyllm.OnlineChatModule()
engine = pyttsx3.init()
```

> 使用在线模型时需要配置 `API_KEY`,参考 [LazyLLM 官方文档(平台支持部分)](https://docs.lazyllm.ai/en/stable/#supported-platforms)

**功能说明:**

- `chat`: 使用 LazyLLM 提供的在线聊天模块(默认调用 sensenova 接口)
- 支持更换不同的大模型后端
- 自动处理对话上下文管理
- `engine`: 初始化本地语音合成引擎 (pyttsx3)
- 跨平台文本转语音输出
- 支持调整语速、音量等参数

### Step 2: 构建语音助手主逻辑

```python
def listen(chat):
r = sr.Recognizer()
with sr.Microphone() as source:
print("Calibrating...")
r.adjust_for_ambient_noise(source, duration=5)
print("Okay, go!")
while 1:
text = ""
print("listening now...")
try:
audio = r.listen(source, timeout=5, phrase_time_limit=30)
print("Recognizing...")
text = r.recognize_whisper(
audio,
model="medium.en",
show_dict=True,
)["text"]
except Exception as e:
unrecognized_speech_text = (
f"Sorry, I didn't catch that. Exception was: {e}s"
)
text = unrecognized_speech_text
print(text)
response_text = chat(text)
print(response_text)
engine.say(response_text)
engine.runAndWait()
```

### 示例运行结果

**你说:**
"What is the capital of France?"

**程序控制台输出:**

```bash
Calibrating...
Okay, go!
listening now...
Recognizing...
You said: What is the capital of France?
The capital of France is Paris.
```

**系统语音播报:**
"The capital of France is Paris."

### 注意事项

本脚本需要在 **具备麦克风设备** 的机器上运行。

如果在 **远程服务器** 或 **虚拟机** 中执行,请确保:

1. 服务器或宿主机有可用麦克风;
2. 运行环境已开启麦克风访问权限。

> 否则程序可能在调用 `sr.Microphone()` 时出现设备初始化错误。
> 当检测到 **无声音输入** 时,`whisper` 模型可能会误识别为常见短语(如 “Thank you” 等)。