LazyAGI · jackyzzzk · Jul 25, 2025 · Jul 29, 2025 · Jul 29, 2025 · Jul 29, 2025
diff --git a/docs/en/Cookbook/voice_agent.md b/docs/en/Cookbook/voice_agent.md
@@ -0,0 +1,86 @@
+# Voice Dialogue Agent
+## This project demonstrates how to use LazyLLM to build a voice assistant system that supports speech input and audio output. It captures voice input through a microphone, transcribes it into text, generates a response using a large language model, and speaks the result aloud.
+
+## !!! abstract "In this section, you will learn how to:"
+## Use speech_recognition to capture and recognize voice input from a microphone.
+## Use LazyLLM.OnlineChatModule to invoke a large language model for natural language responses.
+## Use pyttsx3 to convert text to speech for spoken output.
+
+# Project Dependencies
+## Ensure the following dependencies are installed:
+```bash
+pip install lazyllm pyttsx3 speechrecognition
+```
+```
+import speech_recognition as sr
+import pyttsx3
+import lazyllm
+```
+
+# Step-by-Step Breakdown
+## Step 1: Initialize the LLM and Text-to-Speech Engine
+
+```python
+chat = lazyllm.OnlineChatModule()
+engine = pyttsx3.init()
+```
+
+**Function Description：**
+- `chat`: Uses LazyLLM's online chat module (default sensenova API)
+  - Supports switching different LLM backends
+  - Automatically manages conversation context
+- `engine`: Initializes local text-to-speech engine (pyttsx3)
+  - Cross-platform speech output
+  - Supports adjusting speech rate, volume and other parameters
+
+## Step 2: Build Voice Assistant Main Logic
+
+``` python
+def listen(chat):
+    r = sr.Recognizer()
+    with sr.Microphone() as source:
+        print("Calibrating...")
+        r.adjust_for_ambient_noise(source, duration=5)
+        print("Okay, go!")
+        while 1:
+            text = ""
+            print("listening now...")
+            try:
+                audio = r.listen(source, timeout=5, phrase_time_limit=30)
+                print("Recognizing...")
+                text = r.recognize_whisper(
+                    audio,
+                    model="medium.en",
+                    show_dict=True,
+                )["text"]
+            except Exception as e:
+                unrecognized_speech_text = (
+                    f"Sorry, I didn't catch that. Exception was: {e}s"
+                )
+                text = unrecognized_speech_text
+            print(text)
+            response_text = chat(text)
+            print(response_text)
+            engine.say(response_text)
+            engine.runAndWait()
+```
+
+## Sample Output
+
+#### Example Scenario:
+
+**You say:**  
+"What is the capital of France?"
+
+**Console output:**
+```
+Calibrating...
+Okay, go!
+listening now...
+Recognizing...
+You said: What is the capital of France?
+The capital of France is Paris.
+```
+
+**System speech output:**  
+"The capital of France is Paris."
diff --git a/docs/mkdocs.template.yml b/docs/mkdocs.template.yml
@@ -15,6 +15,7 @@ nav:
   - Great Writer: Cookbook/great_writer.md
   - RAG: Cookbook/rag.md
   - Streaming: Cookbook/streaming.md
+  - Voice_Assistant: Cookbook/voice_agent.md
 - Best Practice:
   - Flow: Best Practice/flow.md
   - Flowapp: Best Practice/flowapp.md

diff --git a/docs/zh/Cookbook/voice_agent.md b/docs/zh/Cookbook/voice_agent.md
@@ -0,0 +1,90 @@
+# 语音对话agent
+
+## 本项目展示了如何使用[LazyLLM](https://github.com/LazyAGI/LazyLLM)，实现一个支持语音输入与语音播报的语音助手系统，可通过麦克风接收用户语音指令、识别语音文本、调用大模型生成回答，并通过语音播报返回。
+
+## !!! abstract "通过本节您将学习到以下内容"
+## - 如何使用 `speech_recognition` 接收并识别麦克风语音。
+## - 如何使用 `LazyLLM.OnlineChatModule` 调用大模型进行自然语言回答。
+## - 如何使用 `pyttsx3` 将文本转为语音实现播报。
+
+
+# 项目依赖
+
+## 确保安装以下依赖：
+
+```bash
+pip install lazyllm pyttsx3 speechrecognition
+```
+```
+import speech_recognition as sr
+import pyttsx3
+import lazyllm
+```
+# 步骤详解
+
+## Step 1: 初始化大模型与语音播报引擎
+
+```python
+chat = lazyllm.OnlineChatModule()
+engine = pyttsx3.init()
+```
+
+**功能说明：**
+- `chat`: 使用 LazyLLM 提供的在线聊天模块（默认调用 sensenova 接口）
+  - 支持更换不同的大模型后端
+  - 自动处理对话上下文管理
+- `engine`: 初始化本地语音合成引擎 (pyttsx3)
+  - 跨平台文本转语音输出
+  - 支持调整语速、音量等参数
+
+## Step 2: 构建语音助手主逻辑
+
+```python
+def listen(chat):
+    r = sr.Recognizer()
+    with sr.Microphone() as source:
+        print("Calibrating...")
+        r.adjust_for_ambient_noise(source, duration=5)
+        print("Okay, go!")
+        while 1:
+            text = ""
+            print("listening now...")
+            try:
+                audio = r.listen(source, timeout=5, phrase_time_limit=30)
+                print("Recognizing...")
+                text = r.recognize_whisper(
+                    audio,
+                    model="medium.en",
+                    show_dict=True,
+                )["text"]
+            except Exception as e:
+                unrecognized_speech_text = (
+                    f"Sorry, I didn't catch that. Exception was: {e}s"
+                )
+                text = unrecognized_speech_text
+            print(text)
+            response_text = chat(text)
+            print(response_text)
+            engine.say(response_text)
+            engine.runAndWait()
+```
+
+## 示例运行结果
+
+#### 示例场景：
+
+**你说：** 
+"What is the capital of France?"
+
+**程序控制台输出：** 
+```
+Calibrating...
+Okay, go!
+listening now...
+Recognizing...
+You said: What is the capital of France?
+The capital of France is Paris.
+```
+
+**系统语音播报：**
+"The capital of France is Paris."