LazyAGI · chenzhex · Jul 25, 2025 · Jul 29, 2025 · Jul 29, 2025 · Jul 29, 2025
diff --git a/docs/en/Cookbook/voice_agent.md b/docs/en/Cookbook/voice_agent.md
@@ -0,0 +1,107 @@
+# Voice Dialogue Agent
+
+This project demonstrates how to use [LazyLLM](https://github.com/LazyAGI/LazyLLM) to build a voice assistant system that supports speech input and audio output. It captures voice input through a microphone, transcribes it into text, generates a response using a large language model, and speaks the result aloud.
+
+!!! abstract "In this section, you will learn how to:"
+
+    - Use speech_recognition to capture and recognize voice input from a microphone.
+    - Use LazyLLM.OnlineChatModule to invoke a large language model for natural language responses.
+    - Use pyttsx3 to convert text to speech for spoken output.
+
+## Project Dependencies
+
+Ensure the following dependencies are installed:
+
+```bash
+pip install lazyllm pyttsx3 speechrecognition soundfile
+```
+
+```python
+import speech_recognition as sr
+import pyttsx3
+import lazyllm
+```
+
+## Step-by-Step Breakdown
+
+### Step 1: Initialize the LLM and Text-to-Speech Engine
+
+```python
+chat = lazyllm.OnlineChatModule()
+engine = pyttsx3.init()
+```
+
+> When using online models, you need to configure an `API_KEY`.  
+> See the [LazyLLM official documentation (Supported Platforms section)](https://docs.lazyllm.ai/en/stable/#supported-platforms) for details.
+
+**Function Description：**
+
+- `chat`: Uses LazyLLM's online chat module (default sensenova API)
+    - Supports switching different LLM backends
+    - Automatically manages conversation context
+- `engine`: Initializes local text-to-speech engine (pyttsx3)
+    - Cross-platform speech output
+    - Supports adjusting speech rate, volume and other parameters
+
+### Step 2: Build Voice Assistant Main Logic
+
+``` python
+def listen(chat):
+    r = sr.Recognizer()
+    with sr.Microphone() as source:
+        print("Calibrating...")
+        r.adjust_for_ambient_noise(source, duration=5)
+        print("Okay, go!")
+        while 1:
+            text = ""
+            print("listening now...")
+            try:
+                audio = r.listen(source, timeout=5, phrase_time_limit=30)
+                print("Recognizing...")
+                text = r.recognize_whisper(
+                    audio,
+                    model="medium.en",
+                    show_dict=True,
+                )["text"]
+            except Exception as e:
+                unrecognized_speech_text = (
+                    f"Sorry, I didn't catch that. Exception was: {e}s"
+                )
+                text = unrecognized_speech_text
+            print(text)
+            response_text = chat(text)
+            print(response_text)
+            engine.say(response_text)
+            engine.runAndWait()
+```
+
+### Sample Output
+
+**You say:**  
+"What is the capital of France?"
+
+**Console output:**
+
+```bash
+Calibrating...
+Okay, go!
+listening now...
+Recognizing...
+You said: What is the capital of France?
+The capital of France is Paris.
+```
+
+**System speech output:**  
+"The capital of France is Paris."
+
+### Notes
+
+This script requires a machine **with a working microphone**.
+
+If running on a **remote server** or **virtual machine**, please ensure that:
+
+1. The server or host machine has an available microphone;  
+2. The runtime environment has **microphone access permissions** enabled.
+
+> Otherwise, the program may encounter initialization errors when calling `sr.Microphone()`.  
+> When **no sound input** is detected, the `whisper` model may mistakenly recognize silence as common phrases (such as “Thank you”, etc.).
diff --git a/docs/nav_en.yml b/docs/nav_en.yml
@@ -10,6 +10,7 @@
   - Great Writer: Cookbook/great_writer.md
   - RAG: Cookbook/rag.md
   - Streaming: Cookbook/streaming.md
+  - Voice_Assistant: Cookbook/voice_agent.md
 - Best Practice:
   - Flow: Best Practice/flow.md
   - Flowapp: Best Practice/flowapp.md

diff --git a/docs/nav_zh.yml b/docs/nav_zh.yml
@@ -10,6 +10,7 @@
   - 写作大师: Cookbook/great_writer.md
   - 检索增强: Cookbook/rag.md
   - 流式输出: Cookbook/streaming.md
+  - 语音对话: Cookbook/voice_agent.md
 - 最佳实践:
   - 工作流: Best Practice/flow.md
   - 流程应用: Best Practice/flowapp.md

diff --git a/docs/zh/Cookbook/voice_agent.md b/docs/zh/Cookbook/voice_agent.md
@@ -0,0 +1,106 @@
+# 语音对话 Agent
 def test_chattts(self): 
 def test_stt_sensevoice(self): 
 def test_chattts(self): 
 def test_stt_sensevoice(self): 
+
+本项目展示了如何使用[LazyLLM](https://github.com/LazyAGI/LazyLLM)，实现一个支持语音输入与语音播报的语音助手系统，可通过麦克风接收用户语音指令、识别语音文本、调用大模型生成回答，并通过语音播报返回。
+
+!!! abstract "通过本节您将学习到以下内容"
+
+    - 如何使用 `speech_recognition` 接收并识别麦克风语音。
+    - 如何使用 `LazyLLM.OnlineChatModule` 调用大模型进行自然语言回答。
+    - 如何使用 `pyttsx3` 将文本转为语音实现播报。
+
+## 项目依赖
+
+确保安装以下依赖：
+
+```bash
+pip install lazyllm pyttsx3 speechrecognition soundfile
+```
+
+```python
+import speech_recognition as sr
+import pyttsx3
+import lazyllm
+```
+
+## 步骤详解
+
+### Step 1: 初始化大模型与语音播报引擎
+
+```python
+chat = lazyllm.OnlineChatModule()
+engine = pyttsx3.init()
+```
+
+> 使用在线模型时需要配置 `API_KEY`，参考 [LazyLLM 官方文档（平台支持部分）](https://docs.lazyllm.ai/en/stable/#supported-platforms)
+
+**功能说明：**
+
+- `chat`: 使用 LazyLLM 提供的在线聊天模块（默认调用 sensenova 接口）
+    - 支持更换不同的大模型后端
+    - 自动处理对话上下文管理
+- `engine`: 初始化本地语音合成引擎 (pyttsx3)
+    - 跨平台文本转语音输出
+    - 支持调整语速、音量等参数
+
+### Step 2: 构建语音助手主逻辑
+
+```python
+def listen(chat):
+    r = sr.Recognizer()
+    with sr.Microphone() as source:
+        print("Calibrating...")
+        r.adjust_for_ambient_noise(source, duration=5)
+        print("Okay, go!")
+        while 1:
+            text = ""
+            print("listening now...")
+            try:
+                audio = r.listen(source, timeout=5, phrase_time_limit=30)
+                print("Recognizing...")
+                text = r.recognize_whisper(
+                    audio,
+                    model="medium.en",
+                    show_dict=True,
+                )["text"]
+            except Exception as e:
+                unrecognized_speech_text = (
+                    f"Sorry, I didn't catch that. Exception was: {e}s"
+                )
+                text = unrecognized_speech_text
+            print(text)
+            response_text = chat(text)
+            print(response_text)
+            engine.say(response_text)
+            engine.runAndWait()
+```
+
+### 示例运行结果
+
+**你说：**  
+"What is the capital of France?"
+
+**程序控制台输出：**
+
+```bash
+Calibrating...
+Okay, go!
+listening now...
+Recognizing...
+You said: What is the capital of France?
+The capital of France is Paris.
+```
+
+**系统语音播报：**  
+"The capital of France is Paris."
+
+### 注意事项
+
+本脚本需要在 **具备麦克风设备** 的机器上运行。
+
+如果在 **远程服务器** 或 **虚拟机** 中执行，请确保：
+
+1. 服务器或宿主机有可用麦克风；  
+2. 运行环境已开启麦克风访问权限。
+
+> 否则程序可能在调用 `sr.Microphone()` 时出现设备初始化错误。
+> 当检测到 **无声音输入** 时，`whisper` 模型可能会误识别为常见短语（如 “Thank you” 等）。