-
Notifications
You must be signed in to change notification settings - Fork 296
【Hackathon 9th Sprint No.60】【RFC】为 FastDeploy 新增支持 DeepSeek 模型的 Reasoning Parser & Tool Parser #1185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
【Hackathon 9th Sprint No.60】【RFC】为 FastDeploy 新增支持 DeepSeek 模型的 Reasoning Parser & Tool Parser #1185
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,235 @@ | ||
| # FastDeploy 新增支持 DeepSeek 模型的 Reasoning Parser & Tool Parser | ||
|
|
||
| | 任务名称 | 为 FastDeploy 新增支持 DeepSeek 模型的 Reasoning Parser & Tool Parser | | ||
| |------|------| | ||
| | 提交作者 | fgeygfe | | ||
| | 提交时间 | 2025-11-15 | | ||
| | 版本号 | V1.0 | | ||
| | 文件名 | 20251115_add_deepseek_reasoning_tool_parser_support.md | | ||
|
|
||
| # 一、概述 | ||
|
|
||
| ## 1、相关背景 | ||
|
|
||
| FastDeploy 作为高性能推理引擎,目前已经支持多种大语言模型的推理和服务。随着 DeepSeek 系列模型(包括 DeepSeek-V3、DeepSeek-V3.1、DeepSeek-R1 等)的推出,这些模型在推理能力和工具调用方面表现出色。为了更好地支持这些模型的特性,需要为 FastDeploy 新增专门针对 DeepSeek 模型的 Reasoning Parser 和 Tool Parser。 | ||
|
|
||
| ## 2、功能目标 | ||
|
|
||
| * 实现 DeepSeek Reasoning Parser:解析模型输出中的思考内容(reasoning_content)和回复内容(content) | ||
| * 实现 DeepSeek Tool Parser:解析模型输出中的工具调用(tool_calls)信息 | ||
| * 支持流式和非流式两种场景 | ||
| * 支持并行工具调用 | ||
| * 提供完整的单元测试和文档 | ||
|
|
||
| ## 3、意义 | ||
|
|
||
| 完善 FastDeploy 对 DeepSeek 系列模型的支持,为用户提供更强大的推理和工具调用能力,提升模型部署的便利性和性能。 | ||
|
|
||
| # 二、DeepSeek 模型输出协议分析 | ||
|
|
||
| ## 1、Reasoning 输出格式 | ||
|
|
||
| DeepSeek 模型使用 `<think>...</think>` 标记包裹思考内容: | ||
|
|
||
| ```xml | ||
| <think> | ||
| 这里是模型的思考过程... | ||
| 用户询问了关于天气的问题,我需要调用天气工具 | ||
| </think> | ||
|
|
||
| 这是最终的回复内容 | ||
| ``` | ||
|
|
||
| **特点**: | ||
| - 使用 `<think>...</think>` 标记包裹思考内容 | ||
| - 思考内容在前,回复内容在后 | ||
| - 需要容错处理不完整的标记 | ||
|
|
||
| ## 2、Tool Calling 输出格式 | ||
|
|
||
| DeepSeek 模型的工具调用格式遵循类似 OpenAI 的 JSON 协议: | ||
|
|
||
| ```xml | ||
| <think> | ||
| 需要查询天气信息 | ||
| </think> | ||
|
|
||
| <tool_call> | ||
| {"name": "get_weather", "arguments": {"location": "北京", "unit": "c"}} | ||
| </tool_call> | ||
| ``` | ||
|
|
||
| **支持并行工具调用**: | ||
|
|
||
| ```xml | ||
| <tool_call> | ||
| {"name": "get_weather", "arguments": {"location": "北京"}} | ||
| </tool_call> | ||
|
|
||
| <tool_call> | ||
| {"name": "get_weather", "arguments": {"location": "上海"}} | ||
| </tool_call> | ||
| ``` | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 下面文档有提到 <|tool▁calls▁begin|> , 可以类似上面的思考输出, 再做进一步调研, 确认不同模型的工具输出协议是不是有差异,以做不同的解析方案。 |
||
|
|
||
| # 三、设计思路与实现方案 | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 基于现在FD 架构,下面的实现方案是没有问题的。但此处即将会有个思考机制架构调整,需要和您同步一下:
|
||
|
|
||
| ## 1、Reasoning Parser 设计 | ||
|
|
||
| ### 类设计 | ||
| ```python | ||
| @ReasoningParserManager.register_module(["deepseek", "deepseek-r1", "deepseek-v3"]) | ||
| class DeepSeekReasoningParser(ReasoningParser): | ||
| """DeepSeek 系列模型的推理解析器""" | ||
| ``` | ||
|
|
||
| ### 核心方法 | ||
| - `__init__(tokenizer)`: 初始化,获取特殊标记 `<think>`、`</think>` 的 token ID | ||
| - `is_reasoning_end(input_ids)`: 检查推理内容是否结束(检查 `</think>` 标记) | ||
| - `extract_content_ids(input_ids)`: 提取 `</think>` 之后的内容 token IDs | ||
| - `extract_reasoning_content(model_output, request)`: 非流式场景提取推理内容和回复内容 | ||
| - `extract_reasoning_content_streaming(...)`: 流式场景逐步解析推理内容 | ||
|
|
||
| ### 边界情况处理 | ||
| | 情况 | 处理策略 | | ||
| |-----|---------| | ||
| | 完全没有推理标记 | 将整个输出作为 content | | ||
| | 只有 `</think>` 没有 `<think>` | 将结束标记前的内容作为 reasoning | | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 根据这两句的判断, 在流式情况, 如果收到的前几个 token (没有 的情况), 是没有办法判断这些 token是content 还是 reasoning_conten 的哦 |
||
| | `</think>` 后立即是 `<tool_call>` | content 为空,由 Tool Parser 处理 | | ||
| | 多个 `</think>` 标记 | 只识别第一个 | | ||
|
|
||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 对于具体执行方案,给一个以下建议: 首先,列举模型,正常返回是的协议情况, 例如(不一定是 Deepseek 模型的示例) 其次,制定在不同输出场景的处理方案, 例如:
还可以进一步考虑非思考模式下,思考和工具的解析方案。 |
||
| ## 2、Tool Parser 设计 | ||
|
|
||
| ### 类设计 | ||
| ```python | ||
| @ToolParserManager.register_module(["deepseek", "deepseek-r1", "deepseek-v3"]) | ||
| class DeepSeekToolParser(ToolParser): | ||
| """DeepSeek 系列模型的工具调用解析器""" | ||
| ``` | ||
|
|
||
| ### 核心方法 | ||
| - `__init__(tokenizer)`: 初始化,获取 `<tool_call>`、`</tool_call>` 的 token ID | ||
| - `extract_tool_calls(model_output, request)`: 非流式场景提取所有工具调用 | ||
| - `extract_tool_calls_streaming(...)`: 流式场景逐步解析工具调用 | ||
| - `_parse_tool_json(tool_json)`: 解析工具调用的 JSON(支持 partial JSON) | ||
|
|
||
| ### JSON 解析策略 | ||
| 1. 首先尝试标准 `json.loads` | ||
| 2. 如果失败,使用 `partial_json_parser` 处理不完整 JSON | ||
| 3. 如果仍然失败,使用正则提取 name 和 arguments 字段 | ||
|
|
||
| ### 并行工具调用支持 | ||
| - 正确识别多个 `<tool_call>` 块 | ||
| - 为每个工具调用分配唯一的 ID | ||
| - 在流式场景下使用 `current_tool_id` 索引维护多个工具的状态 | ||
|
|
||
| ## 3、文件结构 | ||
|
|
||
| | 文件路径 | 说明 | | ||
| |---------|------| | ||
| | `fastdeploy/reasoning/deepseek_reasoning_parser.py` | Reasoning Parser 实现 | | ||
| | `fastdeploy/entrypoints/openai/tool_parsers/deepseek_tool_parser.py` | Tool Parser 实现 | | ||
| | `tests/reasoning/test_deepseek_reasoning_parser.py` | Reasoning Parser 单元测试 | | ||
| | `tests/entrypoints/openai/tool_parsers/test_deepseek_tool_parser.py` | Tool Parser 单元测试 | | ||
|
|
||
| # 四、测试和验收的考量 | ||
|
|
||
| ## 1、Reasoning Parser 测试 | ||
|
|
||
| - **非流式测试**:标准格式、缺少开始标记、无推理内容、仅推理内容、推理+工具调用等场景 | ||
| - **流式测试**:逐字符推理、推理结束、跨 delta 结束等场景 | ||
| - 测试覆盖率 > 90% | ||
|
|
||
| ## 2、Tool Parser 测试 | ||
|
|
||
| - **非流式测试**:标准单工具、并行工具、不完整 JSON、缺少字段、无工具调用等场景 | ||
| - **流式测试**:逐步解析 name、参数流式输出、完整工具调用等场景 | ||
| - 测试覆盖率 > 90% | ||
|
|
||
| ## 3、集成测试 | ||
|
|
||
| - 启动 FastDeploy 服务,配置 `--reasoning-parser deepseek` 和 `--tool-call-parser deepseek` | ||
| - 发送包含工具的请求,验证返回的 reasoning_content 和 tool_calls | ||
| - 测试 DeepSeek-R1, V3, V3.1 等不同模型 | ||
| - 性能测试:测试大量并发请求下的解析性能 | ||
|
|
||
| # 五、可行性分析和排期规划 | ||
|
|
||
| ## 1、参考实现 | ||
|
|
||
| 可参考现有的 Qwen、ERNIE 系列的 Reasoning Parser 和 Tool Parser 实现: | ||
| - `fastdeploy/reasoning/qwen3_reasoning_parsers.py` | ||
| - `fastdeploy/reasoning/ernie_x1_reasoning_parsers.py` | ||
| - `fastdeploy/entrypoints/openai/tool_parsers/ernie_x1_tool_parser.py` | ||
|
|
||
| ## 2、排期规划 | ||
|
|
||
| | 阶段 | 任务 | 预计时间 | | ||
| |-----|------|---------| | ||
| | Week 1 | Reasoning Parser 实现 | 2天 | | ||
| | Week 1 | Tool Parser 实现 | 2天 | | ||
| | Week 1 | 单元测试开发 | 2天 | | ||
| | Week 2 | 集成测试 | 1天 | | ||
| | Week 2 | 文档更新 | 1天 | | ||
| | Week 2 | Code Review & 修复 | 2天 | | ||
|
|
||
| **总计**:约 10 个工作日 | ||
|
|
||
| # 六、影响面 | ||
|
|
||
| ## 1、新增模块 | ||
|
|
||
| 在 `fastdeploy/reasoning` 和 `fastdeploy/entrypoints/openai/tool_parsers` 中新增类,不影响 FastDeploy 已有功能。 | ||
|
|
||
| ## 2、文档更新 | ||
|
|
||
| 需要更新以下文档: | ||
| - `docs/features/reasoning_output.md`:添加 DeepSeek 模型支持说明 | ||
| - `docs/zh/features/reasoning_output.md`:中文版本 | ||
| - `docs/features/tool_calling.md`:添加 DeepSeek 工具调用说明 | ||
| - `docs/zh/features/tool_calling.md`:中文版本 | ||
|
|
||
| ## 3、使用示例 | ||
|
|
||
| 启动服务: | ||
| ```bash | ||
| python -m fastdeploy.entrypoints.openai.api_server \ | ||
| --model unsloth/DeepSeek-R1-BF16 \ | ||
| --port 8192 \ | ||
| --reasoning-parser deepseek \ | ||
| --tool-call-parser deepseek | ||
| ``` | ||
|
|
||
| 调用 API: | ||
| ```python | ||
| from openai import OpenAI | ||
|
|
||
| client = OpenAI(api_key="EMPTY", base_url="http://localhost:8192/v1") | ||
|
|
||
| response = client.chat.completions.create( | ||
| model="deepseek", | ||
| messages=[{"role": "user", "content": "What's the weather in Beijing?"}], | ||
| tools=[{ | ||
| "type": "function", | ||
| "function": { | ||
| "name": "get_weather", | ||
| "parameters": { | ||
| "type": "object", | ||
| "properties": { | ||
| "location": {"type": "string"}, | ||
| "unit": {"type": "string", "enum": ["c", "f"]} | ||
| } | ||
| } | ||
| } | ||
| }] | ||
| ) | ||
|
|
||
| print(response.choices[0].message.reasoning_content) | ||
| print(response.choices[0].message.tool_calls) | ||
| ``` | ||
|
|
||
| # 七、参考资料 | ||
|
|
||
| 1. FastDeploy Reasoning Output: `docs/features/reasoning_output.md` | ||
| 2. FastDeploy Tool Calling: `docs/features/tool_calling.md` | ||
| 3. OpenAI Tool Calling 规范 | ||
| 4. `partial_json_parser` 库文档 | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以再进一步调研 DeepSeek 在不同模型版本上的的思考输出方式, 总结成一段结论。
例如,如果模型支持思考开关, 开思考时模型可能不会再输出
<think>了; 而在不支持思考开关的模型,可能会输出<think>..</think>