Skip to content

Conversation

slokesha
Copy link
Contributor

@slokesha slokesha commented Sep 9, 2025

Upstream-Vllm implements an encoder-only attention layer that initializes its own AttentionMetadataBuilder

On GPU, the AttentionMetadataBuilder is the standard way to create attention metadata.

On Gaudi, however, attention metadata is created through the make_prefill_metadata function. This causes encoder-only models to bypass the builder logic and fail in scenarios where no KV cache is used.

This PR introduces encoder-only attention metadata support for Gaudi by aligning with the upstream behavior, while handling Gaudi-specific paths. And also Handles edge cases where no KV cache is present

Dependency on Upstream PR - vllm-project/vllm#24612

@slokesha slokesha marked this pull request as ready for review September 11, 2025 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants