Skip to content

Conversation

@Pz1116
Copy link
Contributor

@Pz1116 Pz1116 commented Oct 25, 2025

What this PR does / why we need it?

Add kv pool developer guide

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

vLLM version: v0.11.0rc3
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Oct 25, 2025
Pz1116 and others added 2 commits October 25, 2025 14:17
add guide

Signed-off-by: Pz1116 <[email protected]>
Signed-off-by: Pz1116 <[email protected]>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new developer guide for the KV Cache Pool feature. The guide provides a good overview, but there are a few critical issues that need to be addressed for clarity and correctness. Specifically, there's a contradiction regarding supported storage types (HBM, DRAM, SSD vs. only DRAM), an incomplete sentence that leaves instructions on how to enable a feature unfinished, and a potentially confusing link to an external guide that uses a contradictory flag. Addressing these points will significantly improve the quality and usability of the documentation.


However, the performance gain from prefix caching is highly dependent on cache hit rate, while cache hit rate can be limited if one only uses HBM for kv cache storage.

Hence, KV Cache Pool is proposed to utilize various types of storages including HBM,DRAM and SSD, making a pool for KV Cache storage, while making the prefix of requests visible across all nodes, increasing the cache hit rate for all requests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This section and line 24 state that the KV Cache Pool supports HBM, DRAM, and SSD. However, the limitation section on line 57 says that it currently only supports DRAM. This is a significant contradiction and can be misleading to users. Please clarify the current support and the future roadmap for storage tiers to avoid confusion.


vLLM Ascend Currently supports Mooncake Store for KV Cache Pool. To enable Mooncake Store, one needs to config kv-transfer-config and choose MooncakeStoreConnector as KV Connector.

For step-by-step deployment and configuration, please refer to the guide: https://github.com/vllm-project/vllm-ascend/blob/main/examples/disaggregated_prefill_v1/mooncake_connector_store_deployment_guide.md
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This link points to an external repository (vllm-ascend), which is not ideal for maintainability. Please consider using a relative path if the file is intended to be part of this repository.

Additionally, the linked guide uses the --no_enable_prefix_caching flag, which seems to contradict the feature's description of combining the KV Cache Pool with HBM prefix caching. This is very confusing. The documentation should clarify under which circumstances prefix caching should be disabled and explain why the example uses this flag.


### Combining KV Cache Pool with HBM Prefix Caching
Prefix Caching with HBM is already supported by the vLLM V1 Engine.
By introducing KV Connector V1, users can seamlessly combine HBM-based Prefix Caching with Mooncake-backed KV Pool. The user can enable both features simply by enabling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This sentence is incomplete, which can be very confusing for users trying to enable this feature. Please complete the sentence to explain what users need to enable. For example, you could mention the necessary command-line flags and provide a configuration snippet.

Signed-off-by: pz1116 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant