Skip to content

Conversation

yeonsily
Copy link

@yeonsily yeonsily commented Oct 3, 2025

This PR has a dependency on #1997

Chris-Sigopt and others added 4 commits September 5, 2025 15:42
Fixes HPU graph issues for gemma3 vision inputs

Text warmup to include attn_mask info, so vision+text data can reuse the
graph for language model that's warmed up already.
Changing slicing to index_select for multimodal bucketing for HPU.
Slicing doesn't produce the same hash for the HPU graph with same input
shape.
Use buckets for the vision tower as well to reduce GC recompile
Accuracy bug fix by clone output data of the multimodal-projector.
Validated with Muirbench datasets.
Add missing modelscope package - `VLLM_USE_MODELSCOPE` env doesn't work
without it.
Copy link

@michalkuligowski michalkuligowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still planed for 1.23?

@yeonsily
Copy link
Author

@michalkuligowski Yes, this has a dependency on #1997. I see that 1997 is ready to merge but not merged. any reason?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants