Skip to content

Add anthropic endpoint #21341

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

SriRangaTarun
Copy link

@SriRangaTarun SriRangaTarun commented Jul 22, 2025

This PR adds support for the Anthropic /v1/messages REST API endpoint to the vLLM FastAPI server, making it directly compatible with clients that expect the Anthropic Messages API, such as Claude Code. It addresses issue #21313.

Changes

  • Added a new /v1/messages endpoint to the FastAPI server.
  • Implemented schema validation for Anthropic requests and responses using Pydantic.
  • Translated Anthropic-formatted message arrays into native vLLM prompts.
  • Reused existing inference logic, adding minimal new code for format conversion.
  • Added basic request header validation for required Anthropic fields (x-api-key, anthropic-version).

This PR enables vLLM to act as a drop-in backend for Anthropic API clients, broadening the use cases for vLLM deployments. I also made some small documentation grammar fixes. Please review and let me know if additional features or changes are needed.

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added documentation Improvements or additions to documentation frontend v1 tool-calling labels Jul 22, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the Anthropic Messages API by translating Anthropic's format to vLLM's native prompt format and reusing existing inference logic. However, the new endpoint implementation in api_server.py has several issues, including the use of undefined functions, incorrect imports, and unsafe request handling. It should be rewritten to use FastAPI's Pydantic validation and leverage the existing completion logic in the server for robustness and correctness. Additionally, the Pydantic schemas and the prompt conversion utility can be made more robust to handle the full range of valid Anthropic message formats.

@@ -106,13 +106,54 @@
from vllm.v1.metrics.prometheus import get_prometheus_registry
from vllm.version import __version__ as VLLM_VERSION

from fastapi import APIRouter, Request, HTTPException
from uuid import uuid4
from .schemas import AnthropicMessagesRequest, AnthropicMessagesResponse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The import path for the Anthropic schemas appears to be incorrect. The new schemas are defined in vllm/entrypoints/openai/schemas_anthropic.py, but you are importing from .schemas. This will likely cause an ImportError at runtime.

Suggested change
from .schemas import AnthropicMessagesRequest, AnthropicMessagesResponse
from .schemas_anthropic import AnthropicMessagesRequest, AnthropicMessagesResponse

Comment on lines +122 to +155
@router.post("/v1/messages")
async def anthropic_messages(request: Request):
body = await request.json()
# Validate Anthropic headers and fields
api_key = request.headers.get("x-api-key")
version = request.headers.get("anthropic-version")
if not api_key or not version:
raise HTTPException(status_code=400, detail="Missing required Anthropic headers.")

# Convert messages to prompt
prompt = convert_messages_to_prompt(body["messages"])

# Call existing vLLM generation logic
llm_response = await vllm_generate(
model=body["model"],
prompt=prompt,
max_tokens=body.get("max_tokens", 1024)
)

# Return response in Anthropic format
output = {
"id": f"msg_{uuid4().hex[:24]}",
"type": "message",
"role": "assistant",
"content": [{"type": "text", "text": llm_response["text"]}],
"model": body["model"],
"stop_reason": llm_response.get("stop_reason", "end_turn"),
"stop_sequence": None,
"usage": {
"input_tokens": llm_response["prompt_tokens"],
"output_tokens": llm_response["completion_tokens"],
}
}
return output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This endpoint implementation has several critical issues that will prevent it from working correctly:

  1. Undefined Functions: convert_messages_to_prompt and vllm_generate are not defined or imported in this file, which will cause a NameError at runtime.
  2. Unsafe Request Handling: The request body is parsed manually with request.json(), and dictionary keys are accessed directly (e.g., body["messages"]). This is unsafe and can lead to KeyError exceptions and 500 errors if the request is malformed. You should leverage FastAPI's Pydantic integration for automatic request validation and parsing.
  3. Not Reusing Existing Logic: The PR description mentions reusing existing logic, but the call to the undefined vllm_generate function doesn't do that. The server already has a robust completion generation pipeline that should be used here.

I suggest rewriting this function to address these points by using the AnthropicMessagesRequest Pydantic model for validation and calling the existing completion handler. This will make the implementation robust and consistent with the rest of the API server.

You will need to add the following imports at the top of the file:

from vllm.entrypoints.openai.protocol import (CompletionRequest,
                                              CompletionResponse, ErrorResponse)
from vllm.entrypoints.openai.tool_parsers.utils import (
    convert_messages_to_prompt)

Here is the suggested implementation for the anthropic_messages function:

@router.post("/v1/messages", response_model=AnthropicMessagesResponse)
async def anthropic_messages(anthropic_request: AnthropicMessagesRequest,
                                 raw_request: Request):
    # Validate Anthropic headers
    api_key = raw_request.headers.get("x-api-key")
    version = raw_request.headers.get("anthropic-version")
    if not api_key or not version:
        raise HTTPException(status_code=400,
                            detail="Missing required Anthropic headers.")

    # Convert messages to prompt
    prompt = convert_messages_to_prompt(anthropic_request.messages)

    # Create a vLLM CompletionRequest
    completion_request = CompletionRequest(
        model=anthropic_request.model,
        prompt=prompt,
        max_tokens=anthropic_request.max_tokens,
        stream=False,  # This endpoint is non-streaming.
    )

    # Get the completion handler and call it
    completion_handler = completion(raw_request)
    if completion_handler is None:
        raise HTTPException(status_code=500,
                            detail="Completion handler is not available.")

    result = await completion_handler.create_completion(completion_request,
                                                        raw_request)

    if isinstance(result, ErrorResponse):
        raise HTTPException(status_code=result.code, detail=result.message)

    assert isinstance(result, CompletionResponse)

    # Return response in Anthropic format
    return AnthropicMessagesResponse(
        id=f"msg_{uuid4().hex[:24]}",
        type="message",
        role="assistant",
        content=[
            {
                "type": "text",
                "text": result.choices[0].text
            }
        ],
        model=anthropic_request.model,
        stop_reason=result.choices[0].finish_reason,
        stop_sequence=None,
        usage={
            "input_tokens": result.usage.prompt_tokens,
            "output_tokens": result.usage.completion_tokens,
        })

Comment on lines +4 to +6
class AnthropicMessageBlock(BaseModel):
role: str # "user" | "assistant"
content: Any
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The type hints for role and content can be more specific to improve validation and robustness, making the schema safer and more aligned with the Anthropic API specification.

  • role should be a Literal["user", "assistant"] to enforce valid roles at the type level.
  • content can be a string or a list of content blocks according to the Anthropic API. Using Any bypasses type validation. A Union[str, List[Dict[str, Any]]] would be more accurate and safer.

You'll also need to update the imports on line 2 to include Literal and Union from typing.

Suggested change
class AnthropicMessageBlock(BaseModel):
role: str # "user" | "assistant"
content: Any
class AnthropicMessageBlock(BaseModel):
role: Literal["user", "assistant"]
content: Union[str, List[Dict[str, Any]]]

Comment on lines +126 to +134
def convert_messages_to_prompt(messages):
# Converts an Anthropic-style conversation to a plain prompt string.
prompt = ""
for msg in messages:
if msg["role"] == "user":
prompt += f"Human: {msg['content']}\n"
elif msg["role"] == "assistant":
prompt += f"Assistant: {msg['content']}\n"
return prompt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This function is not robust enough for the Anthropic API and has several issues:

  1. No Type Hints: The function signature lacks type hints, which makes it harder to understand and use correctly.
  2. Unsafe Dictionary Access: It uses direct dictionary access (e.g., msg["role"]), which is unsafe and will raise a KeyError if a message is malformed, leading to a 500 error.
  3. Incorrect Content Handling: It assumes msg['content'] is always a string. However, the Anthropic API allows content to be a string or a list of content blocks (e.g., [{"type": "text", "text": "..."}]). Simply stringifying a list of blocks will result in an incorrect prompt.

I suggest a more robust implementation that handles these cases gracefully. You will need to add from typing import Any, Dict, List to the imports at the top of the file.

def convert_messages_to_prompt(messages: List[Dict[str, Any]]) -> str:
    # Converts an Anthropic-style conversation to a plain prompt string.
    prompt = ""
    for msg in messages:
        role = msg.get("role")
        content = msg.get("content")

        if role == "user":
            role_str = "Human"
        elif role == "assistant":
            role_str = "Assistant"
        else:
            # Skip unknown roles
            continue

        text_content = ""
        if isinstance(content, str):
            text_content = content
        elif isinstance(content, list):
            for block in content:
                if isinstance(block, dict) and block.get("type") == "text":
                    text_content += block.get("text", "")

        if text_content:
            prompt += f"{role_str}: {text_content}\n"
    return prompt

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming this file protocol_anthropic.py would be more consistent with where the OpenAI protocols live (protocol.py)

Copy link
Member

@hmellor hmellor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs changes seem unrelated, feel free to make them in a separate PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation frontend tool-calling v1
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

2 participants