feat: vllm=0.16.0, LMCache, uv installer, add messages and responses endpoints. by velaraptor-runpod · Pull Request #277 · runpod-workers/worker-vllm

velaraptor-runpod · 2026-03-18T01:42:41Z

Add BUILD_ARG for LMCache (https://docs.lmcache.ai/)
Update vllm to 0.16.0
Use uv instead of pip
add responses & messages endpoints (note these will not be exposed with normal queue delay)
auto fix if lmcache is enabled, require HMA to be disabled

velaraptor-runpod · 2026-03-20T23:20:43Z

also much faster docker build times with uv.

TimPietruskyRunPod

Review: PR #277

Thanks for the work here, Chris! The uv migration and LMCache support look great. I have a few concerns to address before merging.

Bug: `ErrorResponse` attribute access in `_handle_messages_request`

In src/engine.py, the messages error handler does:

if isinstance(response, ErrorResponse):
    yield AnthropicErrorResponse(
        error=AnthropicError(type=response.error.type, message=response.error.message)
    ).model_dump()

ErrorResponse (from vllm.entrypoints.openai.engine.protocol) has top-level .type and .message attributes — there is no .error nested object. This will raise an AttributeError at runtime. Should be:

error=AnthropicError(type=response.type, message=response.message)

Major version bump: `transformers>=5.2.0`

This jumps from >=4.57.0 to >=5.2.0 — a major version change. Is this actually required by vLLM 0.16.0 or the new endpoints? If not strictly necessary, I'd prefer keeping the lower bound at 4.x to avoid breaking existing builds. If it is required, let's call it out explicitly in the PR description so we know the reasoning.

Missing newline at end of `engine.py`

The diff shows \ No newline at end of file. Please add a trailing newline.

PR title is misleading: vLLM is already at 0.16.0 on main

The title says "vllm=0.16.0" but main already has vllm[flashinfer]==0.16.0 (merged in #272). The actual changes here are the uv migration, LMCache support, and the new endpoints. Consider updating the title to reflect what's actually new, e.g.:
feat: uv installer, LMCache support, add /v1/responses and /v1/messages endpoints

LMCache: no version pin

uv pip install --system lmcache has no version constraint. For reproducible builds, pin it (e.g., lmcache==x.y.z or at least lmcache>=x.y).

New endpoints not documented

The /v1/responses and /v1/messages routes are added but not mentioned in any docs or README. The PR description says "note these will not be exposed with normal queue delay" — can you elaborate? If they're user-facing, they should be documented. If they're experimental/internal, a code comment would help future readers.

"RunPod" → "Runpod" branding change

Is this an official branding decision? The codebase (handler.py comments, engine.py comments, CLAUDE.md, etc.) still uses "RunPod" extensively. If this is intentional, it should probably be a separate follow-up PR that does a complete sweep, not mixed in with feature work.

Minor: inconsistent engine initialization params

responses_engine gets enable_log_outputs but messages_engine does not. Is that intentional, or should messages_engine also support it (if the AnthropicServingMessages constructor accepts it)?

Summary

The core changes (uv, LMCache, new endpoints) are solid. Main blockers:

Bug: Fix response.error.type → response.type in messages handler
Clarify: Is transformers>=5.2.0 required?
Pin: lmcache version

The rest are smaller items. Happy to re-review once the above are addressed!

TimPietruskyRunPod · 2026-03-26T10:12:53Z

Correction to my review: Disregard the point about the PR title being misleading. The vLLM package was bumped in #272, but this PR is about wiring up the new 0.16.0 features (Anthropic /v1/messages, OpenAI /v1/responses, LMCache support) — so the title is accurate. The transformers>=5.2.0 bump is also likely required by these new vLLM 0.16.0 APIs, though it'd still be good to confirm.

The remaining items from my review still stand:

Bug: response.error.type → response.type in _handle_messages_request
Pin: lmcache version for reproducible builds
Minor: missing newline at EOF in engine.py, docs for new endpoints, branding consistency, enable_log_outputs parity

velaraptor-runpod added 3 commits March 17, 2026 19:51

feat: add messages route for anthropic/claude

4c4e039

feat: update 0.16.0, add lmcache

d8ed3b5

feat: update requirements.txt

d980881

velaraptor-runpod requested review from Madiator2011Work and TimPietruskyRunPod March 18, 2026 01:42

Runpod not RunPod

30e8514

TimPietruskyRunPod requested changes Mar 24, 2026

View reviewed changes

velaraptor-runpod added 3 commits April 3, 2026 15:53

requested changes

3ef1fb8

fix: allow for transformers_version

6fbd480

update readmes on TRANSFORMERS_VERSION

c979f00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: vllm=0.16.0, LMCache, uv installer, add messages and responses endpoints.#277

feat: vllm=0.16.0, LMCache, uv installer, add messages and responses endpoints.#277
velaraptor-runpod wants to merge 7 commits intomainfrom
feat/lmcache

velaraptor-runpod commented Mar 18, 2026

Uh oh!

velaraptor-runpod commented Mar 20, 2026

Uh oh!

TimPietruskyRunPod left a comment

Uh oh!

TimPietruskyRunPod commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

velaraptor-runpod commented Mar 18, 2026

Uh oh!

velaraptor-runpod commented Mar 20, 2026

Uh oh!

TimPietruskyRunPod left a comment

Choose a reason for hiding this comment

Review: PR #277

Bug: ErrorResponse attribute access in _handle_messages_request

Major version bump: transformers>=5.2.0

Missing newline at end of engine.py

PR title is misleading: vLLM is already at 0.16.0 on main

LMCache: no version pin

New endpoints not documented

"RunPod" → "Runpod" branding change

Minor: inconsistent engine initialization params

Summary

Uh oh!

TimPietruskyRunPod commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Bug: `ErrorResponse` attribute access in `_handle_messages_request`

Major version bump: `transformers>=5.2.0`

Missing newline at end of `engine.py`