Skip to content

Conversation

@yangulei
Copy link
Contributor

@yangulei yangulei commented Oct 28, 2025

  • Generate prefix-prefill buckets with padding ratio limit for the context length.
  • Implement the find_bucket logic for prefix-prefill.

@yangulei yangulei changed the title add padding ratio limit to the context bucketing [Draft] add padding ratio limit to the context bucketing Oct 28, 2025
@yangulei yangulei changed the title [Draft] add padding ratio limit to the context bucketing add padding ratio limit to the context bucketing Oct 29, 2025
@yangulei
Copy link
Contributor Author

@czhu15 @taotod @ranzhejiang
This PR is ready for review now, please help to review, thanks!
I tested with sample code, and a complete testing and evaluation is necessary.
The padding ratio limit for context buckets reuses the VLLM_PROMPT_SEQ_BUCKET_LIMIT. And I recommend setting it to 0.1 as the sequence length and context length might be padded with the limit simultaneously.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant