diff --git a/README.md b/README.md index 158f78bb..aff19b6d 100644 --- a/README.md +++ b/README.md @@ -28,6 +28,7 @@ If you find this GitHub repository useful, please consider giving it a free star - [x] Support Cross-Region Inference - [x] Support Application Inference Profiles (**new**) - [x] Support Reasoning (**new**) +- [x] Support Interleaved thinking (**new**) Please check [Usage Guide](./docs/Usage.md) for more details about how to use the new APIs. diff --git a/docs/Usage.md b/docs/Usage.md index 822a870c..872cbc6b 100644 --- a/docs/Usage.md +++ b/docs/Usage.md @@ -15,6 +15,7 @@ export OPENAI_BASE_URL= - [Multimodal API](#multimodal-api) - [Tool Call](#tool-call) - [Reasoning](#reasoning) +- [Interleaved thinking (beta)](#Interleaved thinking (beta)) ## Models API @@ -135,6 +136,7 @@ print(doc_result[0][:5]) **Example Request** ```bash +curl $OPENAI_BASE_URL/chat/completions \ curl $OPENAI_BASE_URL/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $OPENAI_API_KEY" \ @@ -340,7 +342,6 @@ curl $OPENAI_BASE_URL/chat/completions \ -d '{ "model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0", "messages": [ - { "role": "user", "content": "which one is bigger, 3.9 or 3.11?" } @@ -441,4 +442,55 @@ for chunk in response: reasoning_content += chunk.choices[0].delta.reasoning_content elif chunk.choices[0].delta.content: content += chunk.choices[0].delta.content -``` \ No newline at end of file +``` + +## Interleaved thinking (beta) + +**Important Notice**: Please carefully review the following points before using reasoning mode for Chat completion API. + +Extended thinking with tool use in Claude 4 models supports [interleaved thinking](https://docs.aws.amazon.com/bedrock/latest/userguide/claude-messages-extended-thinking.html#claude-messages-extended-thinking-tool-use-interleaved) enables Claude 4 models to think between tool calls and run more sophisticated reasoning after receiving tool results. which is helpful for more complex agentic interactions. +With interleaved thinking, the `budget_tokens` can exceed the `max_tokens` parameter because it represents the total budget across all thinking blocks within one assistant turn. + + +**Example Request** + +- Non-Streaming + +```bash +curl http://127.0.0.1:8000/api/v1/chat/completions \ +-H "Content-Type: application/json" \ +-H "Authorization: Bearer bedrock" \ +-d '{ +"model": "us.anthropic.claude-sonnet-4-20250514-v1:0", +"max_tokens": 2048, +"messages": [{ +"role": "user", +"content": "有一天,一个女孩参加数学考试只得了 38 分。她心里对父亲的惩罚充满恐惧,于是偷偷把分数改成了 88 分。她的父亲看到试卷后,怒发冲冠,狠狠地给了她一巴掌,怒吼道:“你这 8 怎么一半是绿的一半是红的,你以为我是傻子吗?”女孩被打后,委屈地哭了起来,什么也没说。过了一会儿,父亲突然崩溃了。请问这位父亲为什么过一会崩溃了?" +}], +"extra_body": { +"anthropic_beta": ["interleaved-thinking-2025-05-14"], +"thinking": {"type": "enabled", "budget_tokens": 4096} +} +}' +``` + +- Streaming + +```bash +curl http://127.0.0.1:8000/api/v1/chat/completions \ +-H "Content-Type: application/json" \ +-H "Authorization: Bearer bedrock" \ +-d '{ +"model": "us.anthropic.claude-sonnet-4-20250514-v1:0", +"max_tokens": 2048, +"messages": [{ +"role": "user", +"content": "有一天,一个女孩参加数学考试只得了 38 分。她心里对父亲的惩罚充满恐惧,于是偷偷把分数改成了 88 分。她的父亲看到试卷后,怒发冲冠,狠狠地给了她一巴掌,怒吼道:“你这 8 怎么一半是绿的一半是红的,你以为我是傻子吗?”女孩被打后,委屈地哭了起来,什么也没说。过了一会儿,父亲突然崩溃了。请问这位父亲为什么过一会崩溃了?" +}], +"stream": true, +"extra_body": { +"anthropic_beta": ["interleaved-thinking-2025-05-14"], +"thinking": {"type": "enabled", "budget_tokens": 4096} +} +}' +``` diff --git a/docs/Usage_CN.md b/docs/Usage_CN.md index c541e195..4ce3d40f 100644 --- a/docs/Usage_CN.md +++ b/docs/Usage_CN.md @@ -15,6 +15,8 @@ export OPENAI_BASE_URL= - [Multimodal API](#multimodal-api) - [Tool Call](#tool-call) - [Reasoning](#reasoning) +- [Interleaved thinking (beta)](#Interleaved thinking (beta)) + ## Models API @@ -440,4 +442,56 @@ for chunk in response: reasoning_content += chunk.choices[0].delta.reasoning_content elif chunk.choices[0].delta.content: content += chunk.choices[0].delta.content -``` \ No newline at end of file +``` + +## Interleaved thinking (beta) + +**重要提示**:在使用 Chat Completion API 的推理模式(reasoning mode)前,请务必仔细阅读以下内容。 + +Claude 4 模型支持借助工具使用的扩展思维功能(Extended Thinking),其中包含交错思考([interleaved thinking](https://docs.aws.amazon.com/bedrock/latest/userguide/claude-messages-extended-thinking.html#claude-messages-extended-thinking-tool-use-interleaved) )。该功能使 Claude 4 可以在多次调用工具之间进行思考,并在收到工具结果后执行更复杂的推理,这对处理更复杂的 Agentic AI 交互非常有帮助。 + +在交错思考模式下,budget_tokens 可以超过 max_tokens 参数,因为它代表一次助手回合中所有思考块的总 Token 预算。 + + +**Request 示例** + +- Non-Streaming + +```bash +curl http://127.0.0.1:8000/api/v1/chat/completions \ +-H "Content-Type: application/json" \ +-H "Authorization: Bearer bedrock" \ +-d '{ +"model": "us.anthropic.claude-sonnet-4-20250514-v1:0", +"max_tokens": 2048, +"messages": [{ +"role": "user", +"content": "有一天,一个女孩参加数学考试只得了 38 分。她心里对父亲的惩罚充满恐惧,于是偷偷把分数改成了 88 分。她的父亲看到试卷后,怒发冲冠,狠狠地给了她一巴掌,怒吼道:“你这 8 怎么一半是绿的一半是红的,你以为我是傻子吗?”女孩被打后,委屈地哭了起来,什么也没说。过了一会儿,父亲突然崩溃了。请问这位父亲为什么过一会崩溃了?" +}], +"extra_body": { +"anthropic_beta": ["interleaved-thinking-2025-05-14"], +"thinking": {"type": "enabled", "budget_tokens": 4096} +} +}' +``` + +- Streaming + +```bash +curl http://127.0.0.1:8000/api/v1/chat/completions \ +-H "Content-Type: application/json" \ +-H "Authorization: Bearer bedrock" \ +-d '{ +"model": "us.anthropic.claude-sonnet-4-20250514-v1:0", +"max_tokens": 2048, +"messages": [{ +"role": "user", +"content": "有一天,一个女孩参加数学考试只得了 38 分。她心里对父亲的惩罚充满恐惧,于是偷偷把分数改成了 88 分。她的父亲看到试卷后,怒发冲冠,狠狠地给了她一巴掌,怒吼道:“你这 8 怎么一半是绿的一半是红的,你以为我是傻子吗?”女孩被打后,委屈地哭了起来,什么也没说。过了一会儿,父亲突然崩溃了。请问这位父亲为什么过一会崩溃了?" +}], +"stream": true, +"extra_body": { +"anthropic_beta": ["interleaved-thinking-2025-05-14"], +"thinking": {"type": "enabled", "budget_tokens": 4096} +} +}' +``` diff --git a/src/api/models/bedrock.py b/src/api/models/bedrock.py index 9a8fd3c5..4977795a 100644 --- a/src/api/models/bedrock.py +++ b/src/api/models/bedrock.py @@ -567,6 +567,10 @@ def _parse_request(self, chat_request: ChatRequest) -> dict: assert "function" in chat_request.tool_choice tool_config["toolChoice"] = {"tool": {"name": chat_request.tool_choice["function"].get("name", "")}} args["toolConfig"] = tool_config + # add Additional fields to enable extend thinking + if chat_request.extra_body: + # reasoning_config will not be used + args["additionalModelRequestFields"] = chat_request.extra_body return args def _create_response( diff --git a/src/api/schema.py b/src/api/schema.py index b6b8c158..233e1139 100644 --- a/src/api/schema.py +++ b/src/api/schema.py @@ -107,6 +107,7 @@ class ChatRequest(BaseModel): tools: list[Tool] | None = None tool_choice: str | object = "auto" stop: list[str] | str | None = None + extra_body: dict | None = None class Usage(BaseModel):