[WIP][docs] Add kv pool developer guide #3752

Pz1116 · 2025-10-25T06:16:41Z

What this PR does / why we need it?

Add kv pool developer guide

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

vLLM version: v0.11.0rc3
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

vLLM version: v0.11.0rc3
vLLM main: vllm-project/vllm@c9461e0

github-actions · 2025-10-25T06:16:52Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

add guide Signed-off-by: Pz1116 <[email protected]>

Signed-off-by: Pz1116 <[email protected]>

gemini-code-assist

Code Review

This pull request introduces a new developer guide for the KV Cache Pool feature. The guide provides a good overview, but there are a few critical issues that need to be addressed for clarity and correctness. Specifically, there's a contradiction regarding supported storage types (HBM, DRAM, SSD vs. only DRAM), an incomplete sentence that leaves instructions on how to enable a feature unfinished, and a potentially confusing link to an external guide that uses a contradictory flag. Addressing these points will significantly improve the quality and usability of the documentation.

gemini-code-assist · 2025-10-25T06:18:04Z

docs/source/developer_guide/feature_guide/KV_Cache_Pool_Guide.md

+
+However, the performance gain from prefix caching is highly dependent on cache hit rate, while cache hit rate can be limited if one only uses HBM for kv cache storage. 
+
+Hence, KV Cache Pool is proposed to utilize various types of storages including HBM,DRAM and SSD, making a pool for KV Cache storage, while making the prefix of requests visible across all nodes, increasing the cache hit rate for all requests.


This section and line 24 state that the KV Cache Pool supports HBM, DRAM, and SSD. However, the limitation section on line 57 says that it currently only supports DRAM. This is a significant contradiction and can be misleading to users. Please clarify the current support and the future roadmap for storage tiers to avoid confusion.

gemini-code-assist · 2025-10-25T06:18:04Z

docs/source/developer_guide/feature_guide/KV_Cache_Pool_Guide.md

+
+vLLM Ascend Currently supports Mooncake Store for KV Cache Pool. To enable Mooncake Store, one needs to config kv-transfer-config and choose MooncakeStoreConnector as KV Connector. 
+
+For step-by-step deployment and configuration, please refer to the guide: https://github.com/vllm-project/vllm-ascend/blob/main/examples/disaggregated_prefill_v1/mooncake_connector_store_deployment_guide.md


This link points to an external repository (vllm-ascend), which is not ideal for maintainability. Please consider using a relative path if the file is intended to be part of this repository.

Additionally, the linked guide uses the --no_enable_prefix_caching flag, which seems to contradict the feature's description of combining the KV Cache Pool with HBM prefix caching. This is very confusing. The documentation should clarify under which circumstances prefix caching should be disabled and explain why the example uses this flag.

gemini-code-assist · 2025-10-25T06:18:04Z

docs/source/developer_guide/feature_guide/KV_Cache_Pool_Guide.md

+
+### Combining KV Cache Pool with HBM Prefix Caching 
+Prefix Caching with HBM is already supported by the vLLM V1 Engine.
+By introducing KV Connector V1, users can seamlessly combine HBM-based Prefix Caching with Mooncake-backed KV Pool. The user can enable both features simply by enabling 


This sentence is incomplete, which can be very confusing for users trying to enable this feature. Please complete the sentence to explain what users need to enable. For example, you could mention the necessary command-line flags and provide a configuration snippet.

Signed-off-by: pz1116 <[email protected]>

github-actions bot added the documentation Improvements or additions to documentation label Oct 25, 2025

Pz1116 and others added 2 commits October 25, 2025 14:17

Create KV_Cache_Pool_Guide.md

079dee0

add guide Signed-off-by: Pz1116 <[email protected]>

update

cbd7fe9

Signed-off-by: Pz1116 <[email protected]>

Pz1116 force-pushed the KV_POOL_MD branch from d6df942 to cbd7fe9 Compare October 25, 2025 06:17

gemini-code-assist bot reviewed Oct 25, 2025

View reviewed changes

update

6b820e2

Signed-off-by: pz1116 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP][docs] Add kv pool developer guide #3752

[WIP][docs] Add kv pool developer guide #3752

Uh oh!

Pz1116 commented Oct 25, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Oct 25, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 25, 2025

Uh oh!

gemini-code-assist bot Oct 25, 2025

Uh oh!

gemini-code-assist bot Oct 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		However, the performance gain from prefix caching is highly dependent on cache hit rate, while cache hit rate can be limited if one only uses HBM for kv cache storage.

		Hence, KV Cache Pool is proposed to utilize various types of storages including HBM,DRAM and SSD, making a pool for KV Cache storage, while making the prefix of requests visible across all nodes, increasing the cache hit rate for all requests.


		vLLM Ascend Currently supports Mooncake Store for KV Cache Pool. To enable Mooncake Store, one needs to config kv-transfer-config and choose MooncakeStoreConnector as KV Connector.

		For step-by-step deployment and configuration, please refer to the guide: https://github.com/vllm-project/vllm-ascend/blob/main/examples/disaggregated_prefill_v1/mooncake_connector_store_deployment_guide.md

[WIP][docs] Add kv pool developer guide #3752

Are you sure you want to change the base?

[WIP][docs] Add kv pool developer guide #3752

Uh oh!

Conversation

Pz1116 commented Oct 25, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Oct 25, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Pz1116 commented Oct 25, 2025 •

edited by github-actions bot

Loading