Support token_type_ids in V1 with less code changes #21985

maxdebayser · 2025-07-31T03:01:33Z

This PR is yet another follow-up to #16188 and #21270. It adds support for models such as cross-encoder/ms-marco-MiniLM-L-6-v2 that require token_type_ids ids to be passed from the tokenizer to the model.

Since passing the token_type_ids up the chain from the entrypoints to the model runner, I'm also exploring other implementation alternatives such as: #19988 and #20026.

PR #19988 tries the same approach as V0 but the problem is that it has to touch too many places in the code. #20026 tries to minimize the code impact by passing the token_types as multimodal args, which admittedly is a bit weird. This one adds the token type ids to the pooling params thereby removing the need to touch to many places in the code. It also avoids allocating persistent tensors by encoding the token types together with the token ids.

cc: @DarkLight1337

Signed-off-by: Max de Bayser <[email protected]>

maxdebayser · 2025-07-31T03:02:22Z

vllm/entrypoints/score_utils.py

    post_process_tokens(model_config, engine_prompt)

    if mm_data is not None:
        engine_prompt["multi_modal_data"] = mm_data
    return full_prompt, engine_prompt
+
+
+def compress_token_type_ids(token_type_ids: list[int]) -> int:


This is to minimize the amount of data that is transferred between processes

vllm/model_executor/models/bert.py

gemini-code-assist

Code Review

This pull request introduces a clever optimization for passing token_type_ids to models by bit-packing them into the input_ids tensor. This avoids changing many function signatures across the codebase. The overall approach is sound and the implementation appears correct.

My main feedback focuses on improving the maintainability of this new bit-packing mechanism in vllm/model_executor/models/bert.py. The functions for encoding and decoding token_type_ids have in-place side effects that are not obvious from their names, and the code could benefit from comments explaining the bit-packing logic. Addressing these points will make the code easier to understand and safer for future modifications.

vllm/model_executor/models/bert.py

github-actions · 2025-07-31T03:03:53Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

vllm/pooling_params.py

vllm/entrypoints/llm.py

Signed-off-by: Max de Bayser <[email protected]>

vllm/model_executor/models/bert.py

Signed-off-by: Max de Bayser <[email protected]>

vllm/v1/worker/gpu_model_runner.py

Signed-off-by: Max de Bayser <[email protected]>

DarkLight1337 · 2025-08-09T06:21:25Z

Can you merge from main again? Thanks

Signed-off-by: Max de Bayser <[email protected]>

DarkLight1337 · 2025-08-10T03:49:53Z

Pooling models tests are failing, please check

Signed-off-by: Max de Bayser <[email protected]>

maxdebayser · 2025-08-10T12:34:53Z

Pooling models tests are failing, please check

In the BertWithRope model the position_ids argument was renamed to positions.

Signed-off-by: Max de Bayser <[email protected]>

DarkLight1337

@WoosukKwon is quite busy lately, I'll just merge this since the changes to model runner is really minimal and the pooling models tests have passed

DarkLight1337

Can you open a follow-up PR to update the docs accordingly?

Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: jingyu <[email protected]>

Signed-off-by: Max de Bayser <[email protected]>

Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Paul Pak <[email protected]>

Signed-off-by: Max de Bayser <[email protected]>

Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Boyuan Feng <[email protected]>

Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Diego-Castan <[email protected]>

Signed-off-by: Max de Bayser <[email protected]>

Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Xiao Yu <[email protected]>

Signed-off-by: Max de Bayser <[email protected]>

maxdebayser added 5 commits July 30, 2025 22:09

Pass token type ids as pooling param to the model runner

7f850be

Signed-off-by: Max de Bayser <[email protected]>

fix errors

809384e

Signed-off-by: Max de Bayser <[email protected]>

fix cudagraph problem

6f330b7

Signed-off-by: Max de Bayser <[email protected]>

compress token type ids

794aaf2

Signed-off-by: Max de Bayser <[email protected]>

forgot to(gpu)

a6f949d

Signed-off-by: Max de Bayser <[email protected]>

maxdebayser requested review from DarkLight1337, ywang96, WoosukKwon, robertgshaw2-redhat, njhill, comaniac, alexm-redhat and aarnphm as code owners July 31, 2025 03:01

mergify bot added frontend v1 labels Jul 31, 2025

maxdebayser commented Jul 31, 2025

View reviewed changes

vllm/model_executor/models/bert.py Outdated Show resolved Hide resolved

gemini-code-assist bot reviewed Jul 31, 2025

View reviewed changes

vllm/model_executor/models/bert.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Jul 31, 2025

View reviewed changes

vllm/pooling_params.py Outdated Show resolved Hide resolved

vllm/entrypoints/llm.py Outdated Show resolved Hide resolved

maxdebayser added 3 commits July 31, 2025 11:34

Address review comments

56dba67

Signed-off-by: Max de Bayser <[email protected]>

Merge branch 'upstream_main' into v1_token_type_ids

cdf802a

fix mistake

3fe425a

Signed-off-by: Max de Bayser <[email protected]>

DarkLight1337 reviewed Jul 31, 2025

View reviewed changes

vllm/model_executor/models/bert.py Outdated Show resolved Hide resolved

maxdebayser added 2 commits July 31, 2025 11:43

address review comments

4b19f4c

Signed-off-by: Max de Bayser <[email protected]>

fix type hints

5d0999c

Signed-off-by: Max de Bayser <[email protected]>

DarkLight1337 reviewed Jul 31, 2025

View reviewed changes

vllm/v1/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

maxdebayser added 3 commits July 31, 2025 12:21

address review comments

2074d29

Signed-off-by: Max de Bayser <[email protected]>

Merge branch 'upstream_main' into v1_token_type_ids

148ab54

Merge branch 'upstream_main' into v1_token_type_ids

accf2f7

maxdebayser added 3 commits August 5, 2025 17:53

Merge branch 'upstream_main' into v1_token_type_ids

656059b

Signed-off-by: Max de Bayser <[email protected]>

Merge branch 'upstream_main' into v1_token_type_ids

d9a8835

Merge branch 'upstream_main' into v1_token_type_ids

3d089dd

Signed-off-by: Max de Bayser <[email protected]>

Merge branch 'upstream_main' into v1_token_type_ids

0471896

Signed-off-by: Max de Bayser <[email protected]>

maxdebayser added 2 commits August 10, 2025 09:32

rename argument

db612f7

Signed-off-by: Max de Bayser <[email protected]>

Merge branch 'upstream_main' into v1_token_type_ids

5184a3d

Signed-off-by: Max de Bayser <[email protected]>

rename argument

96e3871

Signed-off-by: Max de Bayser <[email protected]>

DarkLight1337 approved these changes Aug 11, 2025

View reviewed changes

vllm-bot merged commit 39052db into vllm-project:main Aug 11, 2025
38 of 46 checks passed

This was referenced Aug 11, 2025

Add support for token_type_ids #19988

Closed

Add support for token_type_ids using MultiModal args #20026

Closed

DarkLight1337 reviewed Aug 11, 2025

View reviewed changes

jingyu-ml pushed a commit to jingyu-ml/vllm that referenced this pull request Aug 11, 2025

Support token_type_ids in V1 with less code changes (vllm-project#21985)

f39375d

Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: jingyu <[email protected]>

aarnphm pushed a commit to aarnphm/vllm that referenced this pull request Aug 13, 2025

Support token_type_ids in V1 with less code changes (vllm-project#21985)

3902a16

Signed-off-by: Max de Bayser <[email protected]>

paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025

Support token_type_ids in V1 with less code changes (vllm-project#21985)

4289183

Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Paul Pak <[email protected]>

taneem-ibrahim pushed a commit to taneem-ibrahim/vllm that referenced this pull request Aug 14, 2025

Support token_type_ids in V1 with less code changes (vllm-project#21985)

4f5a656

Signed-off-by: Max de Bayser <[email protected]>

LucasWilkinson mentioned this pull request Aug 14, 2025

[Perf] Dont create unnecessary pooling params #22876

Merged

4 tasks

BoyuanFeng pushed a commit to BoyuanFeng/vllm that referenced this pull request Aug 14, 2025

Support token_type_ids in V1 with less code changes (vllm-project#21985)

e52e812

Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Boyuan Feng <[email protected]>

diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025

Support token_type_ids in V1 with less code changes (vllm-project#21985)

b94c5d5

Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Diego-Castan <[email protected]>

juuice-lee pushed a commit to juuice-lee/vllm-moe.code that referenced this pull request Aug 18, 2025

Support token_type_ids in V1 with less code changes (vllm-project#21985)

f127dee

Signed-off-by: Max de Bayser <[email protected]>

yiliu30 pushed a commit to yiliu30/vllm-fork that referenced this pull request Aug 19, 2025

Support token_type_ids in V1 with less code changes (vllm-project#21985)

9afef9c

Signed-off-by: Max de Bayser <[email protected]>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

Support token_type_ids in V1 with less code changes (vllm-project#21985)

e50d6c0

Signed-off-by: Max de Bayser <[email protected]>

xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025

Support token_type_ids in V1 with less code changes (vllm-project#21985)

7a716bd

Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Xiao Yu <[email protected]>

xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025

Support token_type_ids in V1 with less code changes (vllm-project#21985)

10b12bb

Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Xiao Yu <[email protected]>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

Support token_type_ids in V1 with less code changes (vllm-project#21985)

a607e99

Signed-off-by: Max de Bayser <[email protected]>

dumb0002 pushed a commit to dumb0002/vllm that referenced this pull request Aug 28, 2025

Support token_type_ids in V1 with less code changes (vllm-project#21985)

931310b

Signed-off-by: Max de Bayser <[email protected]>

googlercolin pushed a commit to googlercolin/vllm that referenced this pull request Aug 29, 2025

Support token_type_ids in V1 with less code changes (vllm-project#21985)

8dd66c8

Signed-off-by: Max de Bayser <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Support token_type_ids in V1 with less code changes #21985

Support token_type_ids in V1 with less code changes #21985

Uh oh!

maxdebayser commented Jul 31, 2025 •

edited by github-actions bot

Loading

Uh oh!

maxdebayser Jul 31, 2025

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

github-actions bot commented Jul 31, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 commented Aug 9, 2025

Uh oh!

DarkLight1337 commented Aug 10, 2025

Uh oh!

maxdebayser commented Aug 10, 2025

Uh oh!

DarkLight1337 left a comment

Uh oh!

Uh oh!

DarkLight1337 left a comment

Uh oh!

Uh oh!

Uh oh!

Support token_type_ids in V1 with less code changes #21985

Support token_type_ids in V1 with less code changes #21985

Uh oh!

Conversation

maxdebayser commented Jul 31, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maxdebayser Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

github-actions bot commented Jul 31, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 commented Aug 9, 2025

Uh oh!

DarkLight1337 commented Aug 10, 2025

Uh oh!

maxdebayser commented Aug 10, 2025

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

maxdebayser commented Jul 31, 2025 •

edited by github-actions bot

Loading