-
-
Notifications
You must be signed in to change notification settings - Fork 9k
Add anthropic endpoint #21341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add anthropic endpoint #21341
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for the Anthropic Messages API by translating Anthropic's format to vLLM's native prompt format and reusing existing inference logic. However, the new endpoint implementation in api_server.py
has several issues, including the use of undefined functions, incorrect imports, and unsafe request handling. It should be rewritten to use FastAPI's Pydantic validation and leverage the existing completion logic in the server for robustness and correctness. Additionally, the Pydantic schemas and the prompt conversion utility can be made more robust to handle the full range of valid Anthropic message formats.
@@ -106,13 +106,54 @@ | |||
from vllm.v1.metrics.prometheus import get_prometheus_registry | |||
from vllm.version import __version__ as VLLM_VERSION | |||
|
|||
from fastapi import APIRouter, Request, HTTPException | |||
from uuid import uuid4 | |||
from .schemas import AnthropicMessagesRequest, AnthropicMessagesResponse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The import path for the Anthropic schemas appears to be incorrect. The new schemas are defined in vllm/entrypoints/openai/schemas_anthropic.py
, but you are importing from .schemas
. This will likely cause an ImportError
at runtime.
from .schemas import AnthropicMessagesRequest, AnthropicMessagesResponse | |
from .schemas_anthropic import AnthropicMessagesRequest, AnthropicMessagesResponse |
@router.post("/v1/messages") | ||
async def anthropic_messages(request: Request): | ||
body = await request.json() | ||
# Validate Anthropic headers and fields | ||
api_key = request.headers.get("x-api-key") | ||
version = request.headers.get("anthropic-version") | ||
if not api_key or not version: | ||
raise HTTPException(status_code=400, detail="Missing required Anthropic headers.") | ||
|
||
# Convert messages to prompt | ||
prompt = convert_messages_to_prompt(body["messages"]) | ||
|
||
# Call existing vLLM generation logic | ||
llm_response = await vllm_generate( | ||
model=body["model"], | ||
prompt=prompt, | ||
max_tokens=body.get("max_tokens", 1024) | ||
) | ||
|
||
# Return response in Anthropic format | ||
output = { | ||
"id": f"msg_{uuid4().hex[:24]}", | ||
"type": "message", | ||
"role": "assistant", | ||
"content": [{"type": "text", "text": llm_response["text"]}], | ||
"model": body["model"], | ||
"stop_reason": llm_response.get("stop_reason", "end_turn"), | ||
"stop_sequence": None, | ||
"usage": { | ||
"input_tokens": llm_response["prompt_tokens"], | ||
"output_tokens": llm_response["completion_tokens"], | ||
} | ||
} | ||
return output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This endpoint implementation has several critical issues that will prevent it from working correctly:
- Undefined Functions:
convert_messages_to_prompt
andvllm_generate
are not defined or imported in this file, which will cause aNameError
at runtime. - Unsafe Request Handling: The request body is parsed manually with
request.json()
, and dictionary keys are accessed directly (e.g.,body["messages"]
). This is unsafe and can lead toKeyError
exceptions and 500 errors if the request is malformed. You should leverage FastAPI's Pydantic integration for automatic request validation and parsing. - Not Reusing Existing Logic: The PR description mentions reusing existing logic, but the call to the undefined
vllm_generate
function doesn't do that. The server already has a robust completion generation pipeline that should be used here.
I suggest rewriting this function to address these points by using the AnthropicMessagesRequest
Pydantic model for validation and calling the existing completion handler. This will make the implementation robust and consistent with the rest of the API server.
You will need to add the following imports at the top of the file:
from vllm.entrypoints.openai.protocol import (CompletionRequest,
CompletionResponse, ErrorResponse)
from vllm.entrypoints.openai.tool_parsers.utils import (
convert_messages_to_prompt)
Here is the suggested implementation for the anthropic_messages
function:
@router.post("/v1/messages", response_model=AnthropicMessagesResponse)
async def anthropic_messages(anthropic_request: AnthropicMessagesRequest,
raw_request: Request):
# Validate Anthropic headers
api_key = raw_request.headers.get("x-api-key")
version = raw_request.headers.get("anthropic-version")
if not api_key or not version:
raise HTTPException(status_code=400,
detail="Missing required Anthropic headers.")
# Convert messages to prompt
prompt = convert_messages_to_prompt(anthropic_request.messages)
# Create a vLLM CompletionRequest
completion_request = CompletionRequest(
model=anthropic_request.model,
prompt=prompt,
max_tokens=anthropic_request.max_tokens,
stream=False, # This endpoint is non-streaming.
)
# Get the completion handler and call it
completion_handler = completion(raw_request)
if completion_handler is None:
raise HTTPException(status_code=500,
detail="Completion handler is not available.")
result = await completion_handler.create_completion(completion_request,
raw_request)
if isinstance(result, ErrorResponse):
raise HTTPException(status_code=result.code, detail=result.message)
assert isinstance(result, CompletionResponse)
# Return response in Anthropic format
return AnthropicMessagesResponse(
id=f"msg_{uuid4().hex[:24]}",
type="message",
role="assistant",
content=[
{
"type": "text",
"text": result.choices[0].text
}
],
model=anthropic_request.model,
stop_reason=result.choices[0].finish_reason,
stop_sequence=None,
usage={
"input_tokens": result.usage.prompt_tokens,
"output_tokens": result.usage.completion_tokens,
})
class AnthropicMessageBlock(BaseModel): | ||
role: str # "user" | "assistant" | ||
content: Any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type hints for role
and content
can be more specific to improve validation and robustness, making the schema safer and more aligned with the Anthropic API specification.
role
should be aLiteral["user", "assistant"]
to enforce valid roles at the type level.content
can be a string or a list of content blocks according to the Anthropic API. UsingAny
bypasses type validation. AUnion[str, List[Dict[str, Any]]]
would be more accurate and safer.
You'll also need to update the imports on line 2 to include Literal
and Union
from typing
.
class AnthropicMessageBlock(BaseModel): | |
role: str # "user" | "assistant" | |
content: Any | |
class AnthropicMessageBlock(BaseModel): | |
role: Literal["user", "assistant"] | |
content: Union[str, List[Dict[str, Any]]] |
def convert_messages_to_prompt(messages): | ||
# Converts an Anthropic-style conversation to a plain prompt string. | ||
prompt = "" | ||
for msg in messages: | ||
if msg["role"] == "user": | ||
prompt += f"Human: {msg['content']}\n" | ||
elif msg["role"] == "assistant": | ||
prompt += f"Assistant: {msg['content']}\n" | ||
return prompt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is not robust enough for the Anthropic API and has several issues:
- No Type Hints: The function signature lacks type hints, which makes it harder to understand and use correctly.
- Unsafe Dictionary Access: It uses direct dictionary access (e.g.,
msg["role"]
), which is unsafe and will raise aKeyError
if a message is malformed, leading to a 500 error. - Incorrect Content Handling: It assumes
msg['content']
is always a string. However, the Anthropic API allowscontent
to be a string or a list of content blocks (e.g.,[{"type": "text", "text": "..."}]
). Simply stringifying a list of blocks will result in an incorrect prompt.
I suggest a more robust implementation that handles these cases gracefully. You will need to add from typing import Any, Dict, List
to the imports at the top of the file.
def convert_messages_to_prompt(messages: List[Dict[str, Any]]) -> str:
# Converts an Anthropic-style conversation to a plain prompt string.
prompt = ""
for msg in messages:
role = msg.get("role")
content = msg.get("content")
if role == "user":
role_str = "Human"
elif role == "assistant":
role_str = "Assistant"
else:
# Skip unknown roles
continue
text_content = ""
if isinstance(content, str):
text_content = content
elif isinstance(content, list):
for block in content:
if isinstance(block, dict) and block.get("type") == "text":
text_content += block.get("text", "")
if text_content:
prompt += f"{role_str}: {text_content}\n"
return prompt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming this file protocol_anthropic.py
would be more consistent with where the OpenAI protocols live (protocol.py
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs changes seem unrelated, feel free to make them in a separate PR
This PR adds support for the Anthropic
/v1/messages
REST API endpoint to the vLLM FastAPI server, making it directly compatible with clients that expect the Anthropic Messages API, such as Claude Code. It addresses issue #21313.Changes
/v1/messages
endpoint to the FastAPI server.x-api-key
,anthropic-version
).This PR enables vLLM to act as a drop-in backend for Anthropic API clients, broadening the use cases for vLLM deployments. I also made some small documentation grammar fixes. Please review and let me know if additional features or changes are needed.