Add anthropic endpoint #21341

SriRangaTarun · 2025-07-22T00:46:12Z

This PR adds support for the Anthropic /v1/messages REST API endpoint to the vLLM FastAPI server, making it directly compatible with clients that expect the Anthropic Messages API, such as Claude Code. It addresses issue #21313.

Changes

Added a new /v1/messages endpoint to the FastAPI server.
Implemented schema validation for Anthropic requests and responses using Pydantic.
Translated Anthropic-formatted message arrays into native vLLM prompts.
Reused existing inference logic, adding minimal new code for format conversion.
Added basic request header validation for required Anthropic fields (x-api-key, anthropic-version).

This PR enables vLLM to act as a drop-in backend for Anthropic API clients, broadening the use cases for vLLM deployments. I also made some small documentation grammar fixes. Please review and let me know if additional features or changes are needed.

github-actions · 2025-07-22T00:46:19Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request introduces support for the Anthropic Messages API by translating Anthropic's format to vLLM's native prompt format and reusing existing inference logic. However, the new endpoint implementation in api_server.py has several issues, including the use of undefined functions, incorrect imports, and unsafe request handling. It should be rewritten to use FastAPI's Pydantic validation and leverage the existing completion logic in the server for robustness and correctness. Additionally, the Pydantic schemas and the prompt conversion utility can be made more robust to handle the full range of valid Anthropic message formats.

gemini-code-assist · 2025-07-22T00:48:06Z

vllm/entrypoints/openai/api_server.py

@@ -106,13 +106,54 @@
 from vllm.v1.metrics.prometheus import get_prometheus_registry
 from vllm.version import __version__ as VLLM_VERSION

+from fastapi import APIRouter, Request, HTTPException
+from uuid import uuid4
+from .schemas import AnthropicMessagesRequest, AnthropicMessagesResponse


The import path for the Anthropic schemas appears to be incorrect. The new schemas are defined in vllm/entrypoints/openai/schemas_anthropic.py, but you are importing from .schemas. This will likely cause an ImportError at runtime.

Suggested change

from .schemas import AnthropicMessagesRequest, AnthropicMessagesResponse

from .schemas_anthropic import AnthropicMessagesRequest, AnthropicMessagesResponse

gemini-code-assist · 2025-07-22T00:48:06Z

vllm/entrypoints/openai/api_server.py

+@router.post("/v1/messages")
+async def anthropic_messages(request: Request):
+    body = await request.json()
+    # Validate Anthropic headers and fields
+    api_key = request.headers.get("x-api-key")
+    version = request.headers.get("anthropic-version")
+    if not api_key or not version:
+        raise HTTPException(status_code=400, detail="Missing required Anthropic headers.")
+
+    # Convert messages to prompt
+    prompt = convert_messages_to_prompt(body["messages"])
+
+    # Call existing vLLM generation logic
+    llm_response = await vllm_generate(
+        model=body["model"],
+        prompt=prompt,
+        max_tokens=body.get("max_tokens", 1024)
+    )
+
+    # Return response in Anthropic format
+    output = {
+        "id": f"msg_{uuid4().hex[:24]}",
+        "type": "message",
+        "role": "assistant",
+        "content": [{"type": "text", "text": llm_response["text"]}],
+        "model": body["model"],
+        "stop_reason": llm_response.get("stop_reason", "end_turn"),
+        "stop_sequence": None,
+        "usage": {
+            "input_tokens": llm_response["prompt_tokens"],
+            "output_tokens": llm_response["completion_tokens"],
+        }
+    }
+    return output


This endpoint implementation has several critical issues that will prevent it from working correctly:

Undefined Functions: convert_messages_to_prompt and vllm_generate are not defined or imported in this file, which will cause a NameError at runtime.

Unsafe Request Handling: The request body is parsed manually with request.json(), and dictionary keys are accessed directly (e.g., body["messages"]). This is unsafe and can lead to KeyError exceptions and 500 errors if the request is malformed. You should leverage FastAPI's Pydantic integration for automatic request validation and parsing.

Not Reusing Existing Logic: The PR description mentions reusing existing logic, but the call to the undefined vllm_generate function doesn't do that. The server already has a robust completion generation pipeline that should be used here.

I suggest rewriting this function to address these points by using the AnthropicMessagesRequest Pydantic model for validation and calling the existing completion handler. This will make the implementation robust and consistent with the rest of the API server.

You will need to add the following imports at the top of the file:

from vllm.entrypoints.openai.protocol import (CompletionRequest, CompletionResponse, ErrorResponse) from vllm.entrypoints.openai.tool_parsers.utils import ( convert_messages_to_prompt)

Here is the suggested implementation for the anthropic_messages function:

@router.post("/v1/messages", response_model=AnthropicMessagesResponse) async def anthropic_messages(anthropic_request: AnthropicMessagesRequest, raw_request: Request): # Validate Anthropic headers api_key = raw_request.headers.get("x-api-key") version = raw_request.headers.get("anthropic-version") if not api_key or not version: raise HTTPException(status_code=400, detail="Missing required Anthropic headers.") # Convert messages to prompt prompt = convert_messages_to_prompt(anthropic_request.messages) # Create a vLLM CompletionRequest completion_request = CompletionRequest( model=anthropic_request.model, prompt=prompt, max_tokens=anthropic_request.max_tokens, stream=False, # This endpoint is non-streaming. ) # Get the completion handler and call it completion_handler = completion(raw_request) if completion_handler is None: raise HTTPException(status_code=500, detail="Completion handler is not available.") result = await completion_handler.create_completion(completion_request, raw_request) if isinstance(result, ErrorResponse): raise HTTPException(status_code=result.code, detail=result.message) assert isinstance(result, CompletionResponse) # Return response in Anthropic format return AnthropicMessagesResponse( id=f"msg_{uuid4().hex[:24]}", type="message", role="assistant", content=[ { "type": "text", "text": result.choices[0].text } ], model=anthropic_request.model, stop_reason=result.choices[0].finish_reason, stop_sequence=None, usage={ "input_tokens": result.usage.prompt_tokens, "output_tokens": result.usage.completion_tokens, })

gemini-code-assist · 2025-07-22T00:48:07Z

vllm/entrypoints/openai/schemas_anthropic.py

+class AnthropicMessageBlock(BaseModel):
+    role: str  # "user" | "assistant"
+    content: Any


The type hints for role and content can be more specific to improve validation and robustness, making the schema safer and more aligned with the Anthropic API specification.

role should be a Literal["user", "assistant"] to enforce valid roles at the type level.

content can be a string or a list of content blocks according to the Anthropic API. Using Any bypasses type validation. A Union[str, List[Dict[str, Any]]] would be more accurate and safer.

You'll also need to update the imports on line 2 to include Literal and Union from typing.

Suggested change

class AnthropicMessageBlock(BaseModel):

role: str # "user" | "assistant"

content: Any

class AnthropicMessageBlock(BaseModel):

role: Literal["user", "assistant"]

content: Union[str, List[Dict[str, Any]]]

gemini-code-assist · 2025-07-22T00:48:07Z

vllm/entrypoints/openai/tool_parsers/utils.py

+def convert_messages_to_prompt(messages):
+    # Converts an Anthropic-style conversation to a plain prompt string.
+    prompt = ""
+    for msg in messages:
+        if msg["role"] == "user":
+            prompt += f"Human: {msg['content']}\n"
+        elif msg["role"] == "assistant":
+            prompt += f"Assistant: {msg['content']}\n"
+    return prompt


This function is not robust enough for the Anthropic API and has several issues:

No Type Hints: The function signature lacks type hints, which makes it harder to understand and use correctly.

Unsafe Dictionary Access: It uses direct dictionary access (e.g., msg["role"]), which is unsafe and will raise a KeyError if a message is malformed, leading to a 500 error.

Incorrect Content Handling: It assumes msg['content'] is always a string. However, the Anthropic API allows content to be a string or a list of content blocks (e.g., [{"type": "text", "text": "..."}]). Simply stringifying a list of blocks will result in an incorrect prompt.

I suggest a more robust implementation that handles these cases gracefully. You will need to add from typing import Any, Dict, List to the imports at the top of the file.

def convert_messages_to_prompt(messages: List[Dict[str, Any]]) -> str: # Converts an Anthropic-style conversation to a plain prompt string. prompt = "" for msg in messages: role = msg.get("role") content = msg.get("content") if role == "user": role_str = "Human" elif role == "assistant": role_str = "Assistant" else: # Skip unknown roles continue text_content = "" if isinstance(content, str): text_content = content elif isinstance(content, list): for block in content: if isinstance(block, dict) and block.get("type") == "text": text_content += block.get("text", "") if text_content: prompt += f"{role_str}: {text_content}\n" return prompt

hmellor · 2025-07-29T11:22:55Z

vllm/entrypoints/openai/schemas_anthropic.py

Naming this file protocol_anthropic.py would be more consistent with where the OpenAI protocols live (protocol.py)

hmellor

The docs changes seem unrelated, feel free to make them in a separate PR

SriRangaTarun · 2025-07-31T01:10:35Z

@hmellor Thank you for your comments! I have removed the doc changes and changed the file name. Please suggest any further changes you have in mind!

hmellor

Thanks again for the PR!

Could you please localise all the Anthropic specific code to an anthropic directory?

Ideally it we would have AnthropicServingMessages (which could inherit from OpenAIServingChat if appropriate), so the change in api_server.py would be minimal.

hmellor · 2025-07-31T09:13:50Z

docs/community/meetups.md

Doc changes persists

hmellor · 2025-07-31T09:13:54Z

docs/models/extensions/fastsafetensor.md

Doc changes persists

hmellor · 2025-07-31T09:19:25Z

vllm/entrypoints/openai/protocol_anthropic.py

Sorry to ask you to move this again, could it instead be moved to vllm/entrypoints/anthropic/protocol.py?

hmellor · 2025-07-31T09:21:38Z

vllm/entrypoints/openai/tool_parsers/utils.py

+def convert_messages_to_prompt(messages):
+    # Converts an Anthropic-style conversation to a plain prompt string.
+    prompt = ""
+    for msg in messages:
+        if msg["role"] == "user":
+            prompt += f"Human: {msg['content']}\n"
+        elif msg["role"] == "assistant":
+            prompt += f"Assistant: {msg['content']}\n"
+    return prompt


Not sure this is a tool parser

hmellor · 2025-07-31T09:26:53Z

We will also need some tests and to make sure that pre-commit and DCO passes (pip install pre-commit, pre-commit install)

ZJY0516 · 2025-08-26T15:29:27Z

Just checking in — has there been any progress on this PR recently? @SriRangaTarun

tlipoca9 · 2025-09-09T07:06:26Z

Any updates on this?

SriRangaTarun added 11 commits July 15, 2025 16:55

Update meetups.md

48d9510

Update fastsafetensor.md

aa5e717

Update output_processor.py

850bfce

Update processor.py

f6e5ebd

Update api_server.py

183115f

Update api_server.py

b4ea8e6

Create schemas_anthropic.py

905d958

Update api_server.py

ce891c4

Update utils.py

0189bed

Update api_server.py

f527b58

Update api_server.py

56433bf

mergify bot added documentation Improvements or additions to documentation frontend v1 tool-calling labels Jul 22, 2025

github-project-automation bot added this to Tool Calling Jul 22, 2025

gemini-code-assist bot reviewed Jul 22, 2025

View reviewed changes

SriRangaTarun marked this pull request as ready for review July 23, 2025 20:26

SriRangaTarun requested review from hmellor, WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac, alexm-redhat and aarnphm as code owners July 23, 2025 20:26

hmellor reviewed Jul 29, 2025

View reviewed changes

Rename schemas_anthropic.py to protocol_anthropic.py

a28bff2

Update meetups.md

057e7f6

hmellor requested changes Jul 31, 2025

View reviewed changes

freedomkk-qfeng mentioned this pull request Aug 25, 2025

[Feature]Support Anthropic API /v1/messages endpoint sgl-project/sglang#9594

Open

2 tasks

	from .schemas import AnthropicMessagesRequest, AnthropicMessagesResponse
	from .schemas_anthropic import AnthropicMessagesRequest, AnthropicMessagesResponse

Uh oh!

Add anthropic endpoint #21341

Are you sure you want to change the base?

Add anthropic endpoint #21341

Conversation

SriRangaTarun commented Jul 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

hmellor Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

hmellor left a comment

Choose a reason for hiding this comment

Uh oh!

SriRangaTarun commented Jul 31, 2025

Uh oh!

hmellor left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hmellor Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

hmellor Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

hmellor Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

hmellor Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

hmellor commented Jul 31, 2025

Uh oh!

ZJY0516 commented Aug 26, 2025

Uh oh!

tlipoca9 commented Sep 9, 2025

Uh oh!

Uh oh!

SriRangaTarun commented Jul 22, 2025 •

edited by github-actions bot

Loading

hmellor left a comment •

edited

Loading