diff --git a/docs/en/Cookbook/voice_agent.md b/docs/en/Cookbook/voice_agent.md new file mode 100644 index 000000000..ef383488a --- /dev/null +++ b/docs/en/Cookbook/voice_agent.md @@ -0,0 +1,107 @@ +# Voice Dialogue Agent + +This project demonstrates how to use [LazyLLM](https://github.com/LazyAGI/LazyLLM) to build a voice assistant system that supports speech input and audio output. It captures voice input through a microphone, transcribes it into text, generates a response using a large language model, and speaks the result aloud. + +!!! abstract "In this section, you will learn how to:" + + - Use speech_recognition to capture and recognize voice input from a microphone. + - Use LazyLLM.OnlineChatModule to invoke a large language model for natural language responses. + - Use pyttsx3 to convert text to speech for spoken output. + +## Project Dependencies + +Ensure the following dependencies are installed: + +```bash +pip install lazyllm pyttsx3 speechrecognition soundfile +``` + +```python +import speech_recognition as sr +import pyttsx3 +import lazyllm +``` + +## Step-by-Step Breakdown + +### Step 1: Initialize the LLM and Text-to-Speech Engine + +```python +chat = lazyllm.OnlineChatModule() +engine = pyttsx3.init() +``` + +> When using online models, you need to configure an `API_KEY`. +> See the [LazyLLM official documentation (Supported Platforms section)](https://docs.lazyllm.ai/en/stable/#supported-platforms) for details. + +**Function Description:** + +- `chat`: Uses LazyLLM's online chat module (default sensenova API) + - Supports switching different LLM backends + - Automatically manages conversation context +- `engine`: Initializes local text-to-speech engine (pyttsx3) + - Cross-platform speech output + - Supports adjusting speech rate, volume and other parameters + +### Step 2: Build Voice Assistant Main Logic + +``` python +def listen(chat): + r = sr.Recognizer() + with sr.Microphone() as source: + print("Calibrating...") + r.adjust_for_ambient_noise(source, duration=5) + print("Okay, go!") + while 1: + text = "" + print("listening now...") + try: + audio = r.listen(source, timeout=5, phrase_time_limit=30) + print("Recognizing...") + text = r.recognize_whisper( + audio, + model="medium.en", + show_dict=True, + )["text"] + except Exception as e: + unrecognized_speech_text = ( + f"Sorry, I didn't catch that. Exception was: {e}s" + ) + text = unrecognized_speech_text + print(text) + response_text = chat(text) + print(response_text) + engine.say(response_text) + engine.runAndWait() +``` + +### Sample Output + +**You say:** +"What is the capital of France?" + +**Console output:** + +```bash +Calibrating... +Okay, go! +listening now... +Recognizing... +You said: What is the capital of France? +The capital of France is Paris. +``` + +**System speech output:** +"The capital of France is Paris." + +### Notes + +This script requires a machine **with a working microphone**. + +If running on a **remote server** or **virtual machine**, please ensure that: + +1. The server or host machine has an available microphone; +2. The runtime environment has **microphone access permissions** enabled. + +> Otherwise, the program may encounter initialization errors when calling `sr.Microphone()`. +> When **no sound input** is detected, the `whisper` model may mistakenly recognize silence as common phrases (such as “Thank you”, etc.). diff --git a/docs/nav_en.yml b/docs/nav_en.yml index 8e885de6b..a017d30f1 100644 --- a/docs/nav_en.yml +++ b/docs/nav_en.yml @@ -10,6 +10,7 @@ - Great Writer: Cookbook/great_writer.md - RAG: Cookbook/rag.md - Streaming: Cookbook/streaming.md + - Voice_Assistant: Cookbook/voice_agent.md - Best Practice: - Flow: Best Practice/flow.md - Flowapp: Best Practice/flowapp.md diff --git a/docs/nav_zh.yml b/docs/nav_zh.yml index 6600a33da..a2ffc8225 100644 --- a/docs/nav_zh.yml +++ b/docs/nav_zh.yml @@ -10,6 +10,7 @@ - 写作大师: Cookbook/great_writer.md - 检索增强: Cookbook/rag.md - 流式输出: Cookbook/streaming.md + - 语音对话: Cookbook/voice_agent.md - 最佳实践: - 工作流: Best Practice/flow.md - 流程应用: Best Practice/flowapp.md diff --git a/docs/zh/Cookbook/voice_agent.md b/docs/zh/Cookbook/voice_agent.md new file mode 100644 index 000000000..9f37b0a0a --- /dev/null +++ b/docs/zh/Cookbook/voice_agent.md @@ -0,0 +1,106 @@ +# 语音对话 Agent + +本项目展示了如何使用[LazyLLM](https://github.com/LazyAGI/LazyLLM),实现一个支持语音输入与语音播报的语音助手系统,可通过麦克风接收用户语音指令、识别语音文本、调用大模型生成回答,并通过语音播报返回。 + +!!! abstract "通过本节您将学习到以下内容" + + - 如何使用 `speech_recognition` 接收并识别麦克风语音。 + - 如何使用 `LazyLLM.OnlineChatModule` 调用大模型进行自然语言回答。 + - 如何使用 `pyttsx3` 将文本转为语音实现播报。 + +## 项目依赖 + +确保安装以下依赖: + +```bash +pip install lazyllm pyttsx3 speechrecognition soundfile +``` + +```python +import speech_recognition as sr +import pyttsx3 +import lazyllm +``` + +## 步骤详解 + +### Step 1: 初始化大模型与语音播报引擎 + +```python +chat = lazyllm.OnlineChatModule() +engine = pyttsx3.init() +``` + +> 使用在线模型时需要配置 `API_KEY`,参考 [LazyLLM 官方文档(平台支持部分)](https://docs.lazyllm.ai/en/stable/#supported-platforms) + +**功能说明:** + +- `chat`: 使用 LazyLLM 提供的在线聊天模块(默认调用 sensenova 接口) + - 支持更换不同的大模型后端 + - 自动处理对话上下文管理 +- `engine`: 初始化本地语音合成引擎 (pyttsx3) + - 跨平台文本转语音输出 + - 支持调整语速、音量等参数 + +### Step 2: 构建语音助手主逻辑 + +```python +def listen(chat): + r = sr.Recognizer() + with sr.Microphone() as source: + print("Calibrating...") + r.adjust_for_ambient_noise(source, duration=5) + print("Okay, go!") + while 1: + text = "" + print("listening now...") + try: + audio = r.listen(source, timeout=5, phrase_time_limit=30) + print("Recognizing...") + text = r.recognize_whisper( + audio, + model="medium.en", + show_dict=True, + )["text"] + except Exception as e: + unrecognized_speech_text = ( + f"Sorry, I didn't catch that. Exception was: {e}s" + ) + text = unrecognized_speech_text + print(text) + response_text = chat(text) + print(response_text) + engine.say(response_text) + engine.runAndWait() +``` + +### 示例运行结果 + +**你说:** +"What is the capital of France?" + +**程序控制台输出:** + +```bash +Calibrating... +Okay, go! +listening now... +Recognizing... +You said: What is the capital of France? +The capital of France is Paris. +``` + +**系统语音播报:** +"The capital of France is Paris." + +### 注意事项 + +本脚本需要在 **具备麦克风设备** 的机器上运行。 + +如果在 **远程服务器** 或 **虚拟机** 中执行,请确保: + +1. 服务器或宿主机有可用麦克风; +2. 运行环境已开启麦克风访问权限。 + +> 否则程序可能在调用 `sr.Microphone()` 时出现设备初始化错误。 +> 当检测到 **无声音输入** 时,`whisper` 模型可能会误识别为常见短语(如 “Thank you” 等)。 \ No newline at end of file