[Frontend] Add chunked processing to handle long inputs in embedding models #22280

x22x22 · 2025-08-05T22:01:28Z

Original pr
#20837

The previous submission was not merged for too long, resulting in too many conflicts, so I'm resubmitting it.

…scripts, incorporating chunk processing capabilities to handle exceptionally long inputs. The README documentation has been revised to provide comprehensive instructions on usage methods and configuration options. Signed-off-by: x22x22 <[email protected]>

github-actions · 2025-08-05T22:01:37Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request introduces chunked processing for long text embeddings, allowing vLLM to handle inputs that exceed the model's maximum context length. The changes include new configuration options in PoolerConfig, core logic for chunking and aggregation in serving_embedding.py, and several robustness improvements in serving_engine.py. Additionally, new examples are provided to demonstrate and test the new feature.

My review found one potential high-severity issue in vllm/config.py where logic for selecting the correct transformers backend for pooling models seems to have been accidentally removed. This could lead to errors when using pooling models with the transformers backend. The rest of the implementation for chunked processing appears solid and well-designed.

vllm/config.py

Signed-off-by: x22x22 <[email protected]>

…iguration options to facilitate long-text input support. Signed-off-by: x22x22 <[email protected]>

Signed-off-by: x22x22 <[email protected]>

… Extensive Texts Signed-off-by: x22x22 <[email protected]>

vllm/inputs/registry.py

vllm/config.py

vllm/entrypoints/openai/serving_embedding.py

- Restore vllm/transformers_utils/processor.py to main branch - Restore vllm/inputs/registry.py to main branch - Ensure all file metadata matches main branch exactly Signed-off-by: x22x22 <[email protected]>

…edding generation. Signed-off-by: x22x22 <[email protected]>

Signed-off-by: x22x22 <[email protected]>

vllm/entrypoints/openai/serving_embedding.py

Signed-off-by: x22x22 <[email protected]>

vllm/entrypoints/openai/serving_embedding.py

Signed-off-by: x22x22 <[email protected]>

x22x22 · 2025-08-13T07:53:22Z

@maxdebayser @DarkLight1337 @hmellor
All done, thanks!

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 · 2025-08-13T08:16:08Z

Let's merge this

x22x22 · 2025-08-13T08:54:13Z

python -m mypy vllm/entrypoints/openai/serving_embedding.py --python-version 3.11
vllm/entrypoints/openai/serving_embedding.py:160: error: Incompatible return value type (got "PoolerConfig | bool | None", expected "bool")  [return-value]
Found 1 error in 1 file (checked 1 source file)

After merging the main, new errors occurred. Let me handle it.

Signed-off-by: x22x22 <[email protected]>

…models (vllm-project#22280) Signed-off-by: x22x22 <[email protected]> Signed-off-by: Kdump <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Maximilien de Bayser <[email protected]> Co-authored-by: DarkLight1337 <[email protected]>

…models (vllm-project#22280) Signed-off-by: x22x22 <[email protected]> Signed-off-by: Kdump <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Maximilien de Bayser <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> Signed-off-by: Boyuan Feng <[email protected]>

…models (vllm-project#22280) Signed-off-by: x22x22 <[email protected]> Signed-off-by: Kdump <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Maximilien de Bayser <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> Signed-off-by: Diego-Castan <[email protected]>

…models (vllm-project#22280) Signed-off-by: x22x22 <[email protected]> Signed-off-by: Kdump <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Maximilien de Bayser <[email protected]> Co-authored-by: DarkLight1337 <[email protected]>

x22x22 requested review from simon-mo, WoosukKwon, youkaichao, robertgshaw2-redhat, mgoin, tlrmchlsmth, houseroad, hmellor and aarnphm as code owners August 5, 2025 22:01

mergify bot added documentation Improvements or additions to documentation frontend labels Aug 5, 2025

gemini-code-assist bot reviewed Aug 5, 2025

View reviewed changes

vllm/config.py Outdated Show resolved Hide resolved

x22x22 added 6 commits August 6, 2025 06:08

修复合并多模态处理器参数的逻辑，确保正确合并传入的参数。更新了相关文件以使用新的合并方式。

cab8200

Signed-off-by: x22x22 <[email protected]>

restore

57987aa

Signed-off-by: x22x22 <[email protected]>

Feature: Implement chunk processing and maximum embedding length conf…

8e3ba72

…iguration options to facilitate long-text input support. Signed-off-by: x22x22 <[email protected]>

restore

f24b546

Signed-off-by: x22x22 <[email protected]>

restore

b46791b

Signed-off-by: x22x22 <[email protected]>

Feature: Implementation of Chunk Processing for Embedding Requests of…

54c7930

… Extensive Texts Signed-off-by: x22x22 <[email protected]>

DarkLight1337 reviewed Aug 6, 2025

View reviewed changes

vllm/inputs/registry.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Aug 6, 2025

View reviewed changes

vllm/config.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Aug 6, 2025

View reviewed changes

vllm/config.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Aug 6, 2025

View reviewed changes

vllm/entrypoints/openai/serving_embedding.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Aug 6, 2025

View reviewed changes

vllm/entrypoints/openai/serving_embedding.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Aug 6, 2025

View reviewed changes

vllm/entrypoints/openai/serving_embedding.py Show resolved Hide resolved

x22x22 added 4 commits August 6, 2025 14:31

revert: restore processor.py and registry.py to main branch state

1ad1ae3

- Restore vllm/transformers_utils/processor.py to main branch - Restore vllm/inputs/registry.py to main branch - Ensure all file metadata matches main branch exactly Signed-off-by: x22x22 <[email protected]>

Refactor: Enhance the code structure and error handling logic for emb…

35e0aee

…edding generation. Signed-off-by: x22x22 <[email protected]>

Refactor: Enhance the code structure and error handling logic for emb…

483be3e

…edding generation. Signed-off-by: x22x22 <[email protected]>

Refactor: Enhance the code structure and error handling logic for emb…

d410c34

…edding generation. Signed-off-by: x22x22 <[email protected]>

x22x22 added 2 commits August 13, 2025 09:50

Refactoring inelegant code

fa3b69f

Signed-off-by: x22x22 <[email protected]>

Refactoring inelegant code

6584107

Signed-off-by: x22x22 <[email protected]>

maxdebayser reviewed Aug 13, 2025

View reviewed changes