-
Notifications
You must be signed in to change notification settings - Fork 515
[WIP][docs] Add kv pool developer guide #3752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
add guide Signed-off-by: Pz1116 <[email protected]>
Signed-off-by: Pz1116 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new developer guide for the KV Cache Pool feature. The guide provides a good overview, but there are a few critical issues that need to be addressed for clarity and correctness. Specifically, there's a contradiction regarding supported storage types (HBM, DRAM, SSD vs. only DRAM), an incomplete sentence that leaves instructions on how to enable a feature unfinished, and a potentially confusing link to an external guide that uses a contradictory flag. Addressing these points will significantly improve the quality and usability of the documentation.
|
|
||
| However, the performance gain from prefix caching is highly dependent on cache hit rate, while cache hit rate can be limited if one only uses HBM for kv cache storage. | ||
|
|
||
| Hence, KV Cache Pool is proposed to utilize various types of storages including HBM,DRAM and SSD, making a pool for KV Cache storage, while making the prefix of requests visible across all nodes, increasing the cache hit rate for all requests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section and line 24 state that the KV Cache Pool supports HBM, DRAM, and SSD. However, the limitation section on line 57 says that it currently only supports DRAM. This is a significant contradiction and can be misleading to users. Please clarify the current support and the future roadmap for storage tiers to avoid confusion.
|
|
||
| vLLM Ascend Currently supports Mooncake Store for KV Cache Pool. To enable Mooncake Store, one needs to config kv-transfer-config and choose MooncakeStoreConnector as KV Connector. | ||
|
|
||
| For step-by-step deployment and configuration, please refer to the guide: https://github.com/vllm-project/vllm-ascend/blob/main/examples/disaggregated_prefill_v1/mooncake_connector_store_deployment_guide.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This link points to an external repository (vllm-ascend), which is not ideal for maintainability. Please consider using a relative path if the file is intended to be part of this repository.
Additionally, the linked guide uses the --no_enable_prefix_caching flag, which seems to contradict the feature's description of combining the KV Cache Pool with HBM prefix caching. This is very confusing. The documentation should clarify under which circumstances prefix caching should be disabled and explain why the example uses this flag.
|
|
||
| ### Combining KV Cache Pool with HBM Prefix Caching | ||
| Prefix Caching with HBM is already supported by the vLLM V1 Engine. | ||
| By introducing KV Connector V1, users can seamlessly combine HBM-based Prefix Caching with Mooncake-backed KV Pool. The user can enable both features simply by enabling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: pz1116 <[email protected]>
What this PR does / why we need it?
Add kv pool developer guide
Does this PR introduce any user-facing change?
N/A
How was this patch tested?
vLLM version: v0.11.0rc3
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0