diff --git a/docs/en/Cookbook/voice_agent.md b/docs/en/Cookbook/voice_agent.md new file mode 100644 index 000000000..078d2cb45 --- /dev/null +++ b/docs/en/Cookbook/voice_agent.md @@ -0,0 +1,86 @@ +# Voice Dialogue Agent +## This project demonstrates how to use LazyLLM to build a voice assistant system that supports speech input and audio output. It captures voice input through a microphone, transcribes it into text, generates a response using a large language model, and speaks the result aloud. + +## !!! abstract "In this section, you will learn how to:" +## Use speech_recognition to capture and recognize voice input from a microphone. +## Use LazyLLM.OnlineChatModule to invoke a large language model for natural language responses. +## Use pyttsx3 to convert text to speech for spoken output. + +# Project Dependencies +## Ensure the following dependencies are installed: +```bash +pip install lazyllm pyttsx3 speechrecognition +``` +``` +import speech_recognition as sr +import pyttsx3 +import lazyllm +``` + +# Step-by-Step Breakdown +## Step 1: Initialize the LLM and Text-to-Speech Engine + +```python +chat = lazyllm.OnlineChatModule() +engine = pyttsx3.init() +``` + +**Function Description:** +- `chat`: Uses LazyLLM's online chat module (default sensenova API) + - Supports switching different LLM backends + - Automatically manages conversation context +- `engine`: Initializes local text-to-speech engine (pyttsx3) + - Cross-platform speech output + - Supports adjusting speech rate, volume and other parameters + +## Step 2: Build Voice Assistant Main Logic + +``` python +def listen(chat): + r = sr.Recognizer() + with sr.Microphone() as source: + print("Calibrating...") + r.adjust_for_ambient_noise(source, duration=5) + print("Okay, go!") + while 1: + text = "" + print("listening now...") + try: + audio = r.listen(source, timeout=5, phrase_time_limit=30) + print("Recognizing...") + text = r.recognize_whisper( + audio, + model="medium.en", + show_dict=True, + )["text"] + except Exception as e: + unrecognized_speech_text = ( + f"Sorry, I didn't catch that. Exception was: {e}s" + ) + text = unrecognized_speech_text + print(text) + response_text = chat(text) + print(response_text) + engine.say(response_text) + engine.runAndWait() +``` + +## Sample Output + +#### Example Scenario: + +**You say:** +"What is the capital of France?" + +**Console output:** +``` +Calibrating... +Okay, go! +listening now... +Recognizing... +You said: What is the capital of France? +The capital of France is Paris. +``` + +**System speech output:** +"The capital of France is Paris." \ No newline at end of file diff --git a/docs/mkdocs.template.yml b/docs/mkdocs.template.yml index 806aeee06..82f763d16 100644 --- a/docs/mkdocs.template.yml +++ b/docs/mkdocs.template.yml @@ -15,6 +15,7 @@ nav: - Great Writer: Cookbook/great_writer.md - RAG: Cookbook/rag.md - Streaming: Cookbook/streaming.md + - Voice_Assistant: Cookbook/voice_agent.md - Best Practice: - Flow: Best Practice/flow.md - Flowapp: Best Practice/flowapp.md diff --git a/docs/zh/Cookbook/voice_agent.md b/docs/zh/Cookbook/voice_agent.md new file mode 100644 index 000000000..d922927b6 --- /dev/null +++ b/docs/zh/Cookbook/voice_agent.md @@ -0,0 +1,90 @@ +# 语音对话agent + +## 本项目展示了如何使用[LazyLLM](https://github.com/LazyAGI/LazyLLM),实现一个支持语音输入与语音播报的语音助手系统,可通过麦克风接收用户语音指令、识别语音文本、调用大模型生成回答,并通过语音播报返回。 + +## !!! abstract "通过本节您将学习到以下内容" +## - 如何使用 `speech_recognition` 接收并识别麦克风语音。 +## - 如何使用 `LazyLLM.OnlineChatModule` 调用大模型进行自然语言回答。 +## - 如何使用 `pyttsx3` 将文本转为语音实现播报。 + + +# 项目依赖 + +## 确保安装以下依赖: + +```bash +pip install lazyllm pyttsx3 speechrecognition +``` +``` +import speech_recognition as sr +import pyttsx3 +import lazyllm +``` +# 步骤详解 + +## Step 1: 初始化大模型与语音播报引擎 + +```python +chat = lazyllm.OnlineChatModule() +engine = pyttsx3.init() +``` + +**功能说明:** +- `chat`: 使用 LazyLLM 提供的在线聊天模块(默认调用 sensenova 接口) + - 支持更换不同的大模型后端 + - 自动处理对话上下文管理 +- `engine`: 初始化本地语音合成引擎 (pyttsx3) + - 跨平台文本转语音输出 + - 支持调整语速、音量等参数 + +## Step 2: 构建语音助手主逻辑 + +```python +def listen(chat): + r = sr.Recognizer() + with sr.Microphone() as source: + print("Calibrating...") + r.adjust_for_ambient_noise(source, duration=5) + print("Okay, go!") + while 1: + text = "" + print("listening now...") + try: + audio = r.listen(source, timeout=5, phrase_time_limit=30) + print("Recognizing...") + text = r.recognize_whisper( + audio, + model="medium.en", + show_dict=True, + )["text"] + except Exception as e: + unrecognized_speech_text = ( + f"Sorry, I didn't catch that. Exception was: {e}s" + ) + text = unrecognized_speech_text + print(text) + response_text = chat(text) + print(response_text) + engine.say(response_text) + engine.runAndWait() +``` + +## 示例运行结果 + +#### 示例场景: + +**你说:** +"What is the capital of France?" + +**程序控制台输出:** +``` +Calibrating... +Okay, go! +listening now... +Recognizing... +You said: What is the capital of France? +The capital of France is Paris. +``` + +**系统语音播报:** +"The capital of France is Paris."