Skip to content

support Claude 4 Interleaved thinking (beta) #164

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 21, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ If you find this GitHub repository useful, please consider giving it a free star
- [x] Support Cross-Region Inference
- [x] Support Application Inference Profiles (**new**)
- [x] Support Reasoning (**new**)
- [x] Support Interleaved thinking (**new**)

Please check [Usage Guide](./docs/Usage.md) for more details about how to use the new APIs.

Expand Down
56 changes: 54 additions & 2 deletions docs/Usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ export OPENAI_BASE_URL=<API base url>
- [Multimodal API](#multimodal-api)
- [Tool Call](#tool-call)
- [Reasoning](#reasoning)
- [Interleaved thinking (beta)](#Interleaved thinking (beta))

## Models API

Expand Down Expand Up @@ -135,6 +136,7 @@ print(doc_result[0][:5])
**Example Request**

```bash
curl $OPENAI_BASE_URL/chat/completions \
curl $OPENAI_BASE_URL/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
Expand Down Expand Up @@ -340,7 +342,6 @@ curl $OPENAI_BASE_URL/chat/completions \
-d '{
"model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
"messages": [
{
"role": "user",
"content": "which one is bigger, 3.9 or 3.11?"
}
Expand Down Expand Up @@ -441,4 +442,55 @@ for chunk in response:
reasoning_content += chunk.choices[0].delta.reasoning_content
elif chunk.choices[0].delta.content:
content += chunk.choices[0].delta.content
```
```

## Interleaved thinking (beta)

**Important Notice**: Please carefully review the following points before using reasoning mode for Chat completion API.

Extended thinking with tool use in Claude 4 models supports [interleaved thinking](https://docs.aws.amazon.com/bedrock/latest/userguide/claude-messages-extended-thinking.html#claude-messages-extended-thinking-tool-use-interleaved) enables Claude 4 models to think between tool calls and run more sophisticated reasoning after receiving tool results. which is helpful for more complex agentic interactions.
With interleaved thinking, the `budget_tokens` can exceed the `max_tokens` parameter because it represents the total budget across all thinking blocks within one assistant turn.


**Example Request**

- Non-Streaming

```bash
curl http://127.0.0.1:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bedrock" \
-d '{
"model": "us.anthropic.claude-sonnet-4-20250514-v1:0",
"max_tokens": 2048,
"messages": [{
"role": "user",
"content": "有一天,一个女孩参加数学考试只得了 38 分。她心里对父亲的惩罚充满恐惧,于是偷偷把分数改成了 88 分。她的父亲看到试卷后,怒发冲冠,狠狠地给了她一巴掌,怒吼道:“你这 8 怎么一半是绿的一半是红的,你以为我是傻子吗?”女孩被打后,委屈地哭了起来,什么也没说。过了一会儿,父亲突然崩溃了。请问这位父亲为什么过一会崩溃了?"
}],
"extra_body": {
"anthropic_beta": ["interleaved-thinking-2025-05-14"],
"thinking": {"type": "enabled", "budget_tokens": 4096}
}
}'
```

- Streaming

```bash
curl http://127.0.0.1:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bedrock" \
-d '{
"model": "us.anthropic.claude-sonnet-4-20250514-v1:0",
"max_tokens": 2048,
"messages": [{
"role": "user",
"content": "有一天,一个女孩参加数学考试只得了 38 分。她心里对父亲的惩罚充满恐惧,于是偷偷把分数改成了 88 分。她的父亲看到试卷后,怒发冲冠,狠狠地给了她一巴掌,怒吼道:“你这 8 怎么一半是绿的一半是红的,你以为我是傻子吗?”女孩被打后,委屈地哭了起来,什么也没说。过了一会儿,父亲突然崩溃了。请问这位父亲为什么过一会崩溃了?"
}],
"stream": true,
"extra_body": {
"anthropic_beta": ["interleaved-thinking-2025-05-14"],
"thinking": {"type": "enabled", "budget_tokens": 4096}
}
}'
```
56 changes: 55 additions & 1 deletion docs/Usage_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ export OPENAI_BASE_URL=<API base url>
- [Multimodal API](#multimodal-api)
- [Tool Call](#tool-call)
- [Reasoning](#reasoning)
- [Interleaved thinking (beta)](#Interleaved thinking (beta))


## Models API

Expand Down Expand Up @@ -440,4 +442,56 @@ for chunk in response:
reasoning_content += chunk.choices[0].delta.reasoning_content
elif chunk.choices[0].delta.content:
content += chunk.choices[0].delta.content
```
```

## Interleaved thinking (beta)

**重要提示**:在使用 Chat Completion API 的推理模式(reasoning mode)前,请务必仔细阅读以下内容。

Claude 4 模型支持借助工具使用的扩展思维功能(Extended Thinking),其中包含交错思考([interleaved thinking](https://docs.aws.amazon.com/bedrock/latest/userguide/claude-messages-extended-thinking.html#claude-messages-extended-thinking-tool-use-interleaved) )。该功能使 Claude 4 可以在多次调用工具之间进行思考,并在收到工具结果后执行更复杂的推理,这对处理更复杂的 Agentic AI 交互非常有帮助。

在交错思考模式下,budget_tokens 可以超过 max_tokens 参数,因为它代表一次助手回合中所有思考块的总 Token 预算。


**Request 示例**

- Non-Streaming

```bash
curl http://127.0.0.1:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bedrock" \
-d '{
"model": "us.anthropic.claude-sonnet-4-20250514-v1:0",
"max_tokens": 2048,
"messages": [{
"role": "user",
"content": "有一天,一个女孩参加数学考试只得了 38 分。她心里对父亲的惩罚充满恐惧,于是偷偷把分数改成了 88 分。她的父亲看到试卷后,怒发冲冠,狠狠地给了她一巴掌,怒吼道:“你这 8 怎么一半是绿的一半是红的,你以为我是傻子吗?”女孩被打后,委屈地哭了起来,什么也没说。过了一会儿,父亲突然崩溃了。请问这位父亲为什么过一会崩溃了?"
}],
"extra_body": {
"anthropic_beta": ["interleaved-thinking-2025-05-14"],
"thinking": {"type": "enabled", "budget_tokens": 4096}
}
}'
```

- Streaming

```bash
curl http://127.0.0.1:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bedrock" \
-d '{
"model": "us.anthropic.claude-sonnet-4-20250514-v1:0",
"max_tokens": 2048,
"messages": [{
"role": "user",
"content": "有一天,一个女孩参加数学考试只得了 38 分。她心里对父亲的惩罚充满恐惧,于是偷偷把分数改成了 88 分。她的父亲看到试卷后,怒发冲冠,狠狠地给了她一巴掌,怒吼道:“你这 8 怎么一半是绿的一半是红的,你以为我是傻子吗?”女孩被打后,委屈地哭了起来,什么也没说。过了一会儿,父亲突然崩溃了。请问这位父亲为什么过一会崩溃了?"
}],
"stream": true,
"extra_body": {
"anthropic_beta": ["interleaved-thinking-2025-05-14"],
"thinking": {"type": "enabled", "budget_tokens": 4096}
}
}'
```
4 changes: 4 additions & 0 deletions src/api/models/bedrock.py
Original file line number Diff line number Diff line change
Expand Up @@ -567,6 +567,10 @@ def _parse_request(self, chat_request: ChatRequest) -> dict:
assert "function" in chat_request.tool_choice
tool_config["toolChoice"] = {"tool": {"name": chat_request.tool_choice["function"].get("name", "")}}
args["toolConfig"] = tool_config
# add Additional fields to enable extend thinking
if chat_request.extra_body:
# reasoning_config will not be used
args["additionalModelRequestFields"] = chat_request.extra_body
return args

def _create_response(
Expand Down
1 change: 1 addition & 0 deletions src/api/schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@ class ChatRequest(BaseModel):
tools: list[Tool] | None = None
tool_choice: str | object = "auto"
stop: list[str] | str | None = None
extra_body: dict | None = None


class Usage(BaseModel):
Expand Down