-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Closed
Labels
feature requestNew feature or requestNew feature or request
Description
🚀 The feature, motivation and pitch
Prompt Embeds + prefix caching is broken in both v0 and v1 (#24278). #24278 explicitly disables prefix caching whenever prompt embeds is enabled, in fact.
Fixing this in the v0 engine is not worth any effort, since v0 is actively being removed. After #24278 lands, it should be straightforward to enable it in v1 by adding some canonical representation of prompt embeds tensors to the input of the hash function for each block.
I plan on doing this follow-up work, but I didn't want to complicate #24278, so I'm creating this issue to track it.
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
feature requestNew feature or requestNew feature or request