[Bugfix] Decode Tokenized IDs to Strings for `hf_processor` in `llm.chat()` with `model_impl=transformers` #21353

ariG23498 · 2025-07-22T06:21:51Z

This PR adds a check in the llm.chat() processing logic to decode tokenized IDs back to strings when using model_impl="transformers". This ensures compatibility with the Hugging Face processor (hf_processor), which expects string inputs and does not support raw token IDs.

To verify the issue without this fix (or confirm the fix works), use the following code:

from vllm import LLM

model_id = "llava-hf/llava-onevision-qwen2-0.5b-ov-hf"
vlm = LLM(
    model=model_id,
    model_impl="transformers",
    disable_mm_preprocessor_cache=True,
    enable_prefix_caching=False,
    enable_chunked_prefill=False
)

image_url = "http://images.cocodataset.org/val2017/000000039769.jpg"
conversation = [
    {
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {"url": image_url}},
            {"type": "text", "text": "What is the content of this image?"}
        ],
    },
]

# Perform inference and log output.
outputs = vlm.chat(conversation)

for o in outputs:
    generated_text = o.outputs[0].text
    print(generated_text)

Without fix: Expect errors due to tokenized IDs being passed to hf_processor.
With fix: Inference should succeed, printing a description of the image (e.g., cats on a couch).

CC: @hmellor @zucchini-nlp

gemini-code-assist

Code Review

This PR addresses an issue where tokenized IDs were passed to a Hugging Face processor that expects string inputs. The fix correctly decodes the IDs to strings. My review focuses on improving the robustness of the implementation by avoiding in-place modification of input data, which could lead to unexpected side effects or errors.

vllm/inputs/registry.py

github-actions · 2025-07-22T06:28:47Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

ariG23498 · 2025-07-22T07:17:07Z

@DarkLight1337 I have modified the file as suggested.

DarkLight1337

Thanks for the fix! Can you also add a test case in tests/models/multimodal/processing to avoid regression? cc @Isotr0py @hmellor

Isotr0py

LGTM too as long as regression test added!

ariG23498 · 2025-07-22T09:37:47Z

tests/models/multimodal/processing/test_chat.py

I am pretty sure this does not fit the bill for the processing only dir. Any pointers would be great.

You can create a new file called test_transformers.py

Shouldn't the test case for this PR test that the MultiModalProcessor in vllm/model_executor/models/transformers.py still works when token IDs are provided to apply?

We are testing the multi-modal processor here rather than the model implementation, so the test should be under processing directory. The other tests under processing directory also only check the multi-modal processor

And like @hmellor said, we should be unit testing the multi-modal processor without instantiating vLLM engine

hmellor

The fix LGTM, but I think we should change the test to be more specific to the problem we're fixing. Instantiating a full LLM is quite an expensive way to check that the processor can consume token IDs

hmellor · 2025-07-22T09:47:04Z

tests/models/multimodal/processing/test_chat.py

Shouldn't the test case for this PR test that the MultiModalProcessor in vllm/model_executor/models/transformers.py still works when token IDs are provided to apply?

tests/models/multimodal/processing/test_transformers.py

DarkLight1337 · 2025-07-22T12:28:27Z

Please fix pre-commit

Signed-off-by: ariG23498 <[email protected]>

…hat()` with `model_impl=transformers` (vllm-project#21353) Signed-off-by: ariG23498 <[email protected]>

…hat()` with `model_impl=transformers` (vllm-project#21353) Signed-off-by: ariG23498 <[email protected]> Signed-off-by: qizixi <[email protected]>

…hat()` with `model_impl=transformers` (vllm-project#21353) Signed-off-by: ariG23498 <[email protected]>

…hat()` with `model_impl=transformers` (vllm-project#21353) Signed-off-by: ariG23498 <[email protected]> Signed-off-by: avigny <[email protected]>

…hat()` with `model_impl=transformers` (vllm-project#21353) Signed-off-by: ariG23498 <[email protected]> Signed-off-by: shuw <[email protected]>

…hat()` with `model_impl=transformers` (vllm-project#21353) Signed-off-by: ariG23498 <[email protected]> Signed-off-by: x22x22 <[email protected]>

…hat()` with `model_impl=transformers` (vllm-project#21353) Signed-off-by: ariG23498 <[email protected]>

…hat()` with `model_impl=transformers` (vllm-project#21353) Signed-off-by: ariG23498 <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>

…hat()` with `model_impl=transformers` (vllm-project#21353) Signed-off-by: ariG23498 <[email protected]> Signed-off-by: Paul Pak <[email protected]>

…hat()` with `model_impl=transformers` (vllm-project#21353) Signed-off-by: ariG23498 <[email protected]>

…hat()` with `model_impl=transformers` (vllm-project#21353) Signed-off-by: ariG23498 <[email protected]> Signed-off-by: Diego-Castan <[email protected]>

…hat()` with `model_impl=transformers` (vllm-project#21353) Signed-off-by: ariG23498 <[email protected]>

gemini-code-assist bot reviewed Jul 22, 2025

View reviewed changes

vllm/inputs/registry.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Jul 22, 2025

View reviewed changes

vllm/inputs/registry.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Jul 22, 2025

View reviewed changes

Isotr0py approved these changes Jul 22, 2025

View reviewed changes

ariG23498 requested a review from ywang96 as a code owner July 22, 2025 09:36

mergify bot added the multi-modality Related to multi-modality (#4194) label Jul 22, 2025

ariG23498 commented Jul 22, 2025

View reviewed changes

hmellor approved these changes Jul 22, 2025

View reviewed changes

ariG23498 requested a review from DarkLight1337 July 22, 2025 11:03

DarkLight1337 reviewed Jul 22, 2025

View reviewed changes

tests/models/multimodal/processing/test_transformers.py Outdated Show resolved Hide resolved

DarkLight1337 approved these changes Jul 22, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) July 22, 2025 11:20

ariG23498 mentioned this pull request Jul 22, 2025

[Update] transformers backend with VLM support vllm-project/vllm-project.github.io#61

Merged

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 22, 2025

auto-merge was automatically disabled July 22, 2025 12:44
Head branch was pushed to by a user without write access

ariG23498 added 6 commits July 22, 2025 18:14

Update registry.py

ac7fead

Signed-off-by: ariG23498 <[email protected]>

review suggestions

8167226

Signed-off-by: ariG23498 <[email protected]>

adding regression test

3a6ea3b

Signed-off-by: ariG23498 <[email protected]>

review suggestions on test

1dbb72f

Signed-off-by: ariG23498 <[email protected]>

using imageasset in test

f96f6b7

Signed-off-by: ariG23498 <[email protected]>

fix style

9a42b8b

Signed-off-by: ariG23498 <[email protected]>

ariG23498 force-pushed the hf-processor-fix branch from 82fa186 to 9a42b8b Compare July 22, 2025 12:45

DarkLight1337 enabled auto-merge (squash) July 22, 2025 12:46

DarkLight1337 changed the title ~~Decode Tokenized IDs to Strings for hf_processor in llm.chat() with model_impl=transformers~~ [Bugfix] Decode Tokenized IDs to Strings for hf_processor in llm.chat() with model_impl=transformers Jul 22, 2025

DarkLight1337 disabled auto-merge July 22, 2025 15:19

vllm-bot merged commit 2226d5b into vllm-project:main Jul 22, 2025
67 of 71 checks passed

yeqcharlotte pushed a commit to yeqcharlotte/vllm that referenced this pull request Jul 23, 2025

[Bugfix] Decode Tokenized IDs to Strings for hf_processor in `llm.c…

4b6209b

…hat()` with `model_impl=transformers` (vllm-project#21353) Signed-off-by: ariG23498 <[email protected]>

LyrisZhong pushed a commit to LyrisZhong/vllm that referenced this pull request Jul 23, 2025

[Bugfix] Decode Tokenized IDs to Strings for hf_processor in `llm.c…

5f219ee

…hat()` with `model_impl=transformers` (vllm-project#21353) Signed-off-by: ariG23498 <[email protected]>

Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Aug 6, 2025

[Bugfix] Decode Tokenized IDs to Strings for hf_processor in `llm.c…

886a8b0

…hat()` with `model_impl=transformers` (vllm-project#21353) Signed-off-by: ariG23498 <[email protected]>

npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025

[Bugfix] Decode Tokenized IDs to Strings for hf_processor in `llm.c…

2064446

…hat()` with `model_impl=transformers` (vllm-project#21353) Signed-off-by: ariG23498 <[email protected]>

taneem-ibrahim pushed a commit to taneem-ibrahim/vllm that referenced this pull request Aug 14, 2025

[Bugfix] Decode Tokenized IDs to Strings for hf_processor in `llm.c…

279cd7e

…hat()` with `model_impl=transformers` (vllm-project#21353) Signed-off-by: ariG23498 <[email protected]>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

[Bugfix] Decode Tokenized IDs to Strings for hf_processor in `llm.c…

98ce385

…hat()` with `model_impl=transformers` (vllm-project#21353) Signed-off-by: ariG23498 <[email protected]>

googlercolin pushed a commit to googlercolin/vllm that referenced this pull request Aug 29, 2025

[Bugfix] Decode Tokenized IDs to Strings for hf_processor in `llm.c…

d5b3f79

…hat()` with `model_impl=transformers` (vllm-project#21353) Signed-off-by: ariG23498 <[email protected]>

Uh oh!

[Bugfix] Decode Tokenized IDs to Strings for hf_processor in llm.chat() with model_impl=transformers #21353

[Bugfix] Decode Tokenized IDs to Strings for hf_processor in llm.chat() with model_impl=transformers #21353

Uh oh!

Conversation

ariG23498 commented Jul 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jul 22, 2025

Uh oh!

ariG23498 commented Jul 22, 2025

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Isotr0py left a comment

Choose a reason for hiding this comment

Uh oh!

ariG23498 Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

hmellor Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

hmellor left a comment

Choose a reason for hiding this comment

Uh oh!

hmellor Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DarkLight1337 commented Jul 22, 2025

Uh oh!

Uh oh!

Uh oh!

[Bugfix] Decode Tokenized IDs to Strings for `hf_processor` in `llm.chat()` with `model_impl=transformers` #21353

[Bugfix] Decode Tokenized IDs to Strings for `hf_processor` in `llm.chat()` with `model_impl=transformers` #21353

ariG23498 commented Jul 22, 2025 •

edited by github-actions bot

Loading

DarkLight1337 Jul 22, 2025 •

edited

Loading