You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implements Whisper mdoel support in the V1 engine. Key changes include:
- Add encoder-decoder architecture support with cross-attention KV cache management
- Add CrossAttentionManager and CrossAttentionSpec for encoder-decoder KV cache
- Update scheduler to handle cross-attention block allocation and disable prefix caching
- Modify GPU model runner for encoder input processing and attention metadata
- Disable BART / other enc-dec tests/examples (Whisper-only support for now)
- Optimize test performance and fix various integration issues
This closes a major feature gap between V0 and V1, enabling Whisper transcription
in the new engine architecture while maintaining backward compatibility.
Related to V0 deprecation (#18571) and 2025 Q3 roadmap (#20336).
Signed-off-by: Russell Bryant <[email protected]>
Co-authored-by: NickLucche <[email protected]>
Signed-off-by: Russell Bryant <[email protected]>
0 commit comments