feat: optimize kv cache load/offload. #306

Kang-Meng · 2025-10-31T04:39:59Z

No description provided.

xllm/core/framework/kv_cache/kv_cache_store.cpp

RobbieLeung · 2025-11-05T13:27:40Z

xllm/models/llm/glm4_moe_mtp.h

+          return torch::Tensor();
+        }
+      }
+


Add a TODO tag, MTP need more support.

RobbieLeung

LGTM

liutongxuan · 2025-11-06T13:43:13Z

xllm/core/distributed_runtime/worker_service.cpp

+ public:
+  ~ServerStreamHandler() {
+    if (!promise_set_.exchange(true)) {
+      try {


why use try catch here?

liutongxuan · 2025-11-06T13:45:30Z

xllm/core/distributed_runtime/worker_service.h

  std::unique_ptr<std::thread> polling_thread_;

  std::unique_ptr<ThreadPool> threadpool_;
+  ThreadPool copy_threadpool_{5};


why 5 threads? ？？

…-wise.

yq33victor · 2025-11-07T06:39:23Z

xllm/core/common/global_flags.cpp

              "",
              "The address of the kv cache store metadata service.");

+DEFINE_string(store_local_hostname,


what's the different between store_metadata_server and store_local_hostname.

yq33victor · 2025-11-09T02:55:41Z

xllm/core/runtime/params_utils.cpp

+    pb_cache.set_dst_block_id(info.dst_block_id);
+    pb_cache.set_hash_key(info.hash_key, MURMUR_HASH3_VALUE_LEN);
+
+    *pb_block_transfer_info->mutable_transfer_infos()->Add() = pb_cache;


nit:

*pb_block_transfer_info->mutable_transfer_infos()->Add() = std::move(pb_cache);

yq33victor · 2025-11-09T02:59:06Z

xllm/core/framework/model/model_input_params.h

-  uint8_t* hash_key = nullptr;

-  CacheBlockInfo() {}
+enum class TransferType : uint8_t { G2H = 0, H2D = 1, D2G = 2 };


nit: could we add some comments for these types :)

yq33victor · 2025-11-09T03:11:42Z

xllm/core/distributed_runtime/worker_service.cpp

+
+          if (success_cnt != current_slice.size() ||
+              i + stream_copy_batch_size_ >= transfer_slice.size()) {
+            is_completed = true;


emmm... Does the code here indicate a prefetch failure?

yq33victor · 2025-11-09T03:15:46Z

xllm/core/distributed_runtime/worker_service.cpp

+          }
+        }
+        if (is_completed) {
+          close_future.wait();


nit: If is_completed was set to false above, does that mean we no longer need to wait() on close_future here?
how brpc to handle stream_handler in this case

And by the way, how can we ensure that multiple batches are delivered in order or received in order?

yq33victor · 2025-11-09T05:38:50Z

xllm/core/framework/prefix_cache/prefix_cache.cpp


+size_t PrefixCache::insert(const std::vector<Block>& blocks) {
+  std::vector<Murmur3Key> insert_keys;
+  return insert(blocks, &insert_keys);


What is the purpose of insert_keys, it seems not be used later

yq33victor · 2025-11-09T05:45:00Z

xllm/core/platform/stream.cpp

 int Stream::synchronize() const {
 #if defined(USE_NPU)
-  return aclrtSynchronizeStream(stream_.stream());
+  return aclrtSynchronizeStreamWithTimeout(stream_.stream(), timeout_);


in which case we need timeout? and what happen if timeout.

yq33victor · 2025-11-09T05:51:52Z

xllm/core/runtime/worker_impl.cpp

  threadpool_.schedule([this]() mutable { device_.set_device(); });
-  general_threadpool_.schedule([this]() mutable { device_.set_device(); });
+  for (int i = 0; i < h2d_threadpool_.size(); i++) {
+    h2d_threadpool_.schedule_with_tid(


threadpool's construction function can pass a init_func now, so we can build h2d_threadpool_ like this

h2d_threadpool_ = std::make_unique<ThreadPool>( 2, [this]() mutable { device_.set_device(); });

yq33victor · 2025-11-09T05:58:40Z

xllm/core/runtime/worker_impl.cpp

 }

+uint32_t WorkerImpl::offload_kv_blocks(
+    const std::vector<BlockTransferInfo>& block_transfer_info) {


nit: Perhaps it would be best to abstract this code(and the code below) into a new class here.

yq33victor · 2025-11-09T06:03:10Z

xllm/core/runtime/worker_impl.cpp

-          std::move(copy_out_blocks_async(input.input_params)));
+      {
+        std::lock_guard<std::mutex> lock(mutex_);
+        if (layer_wise_load_synchronizer_.count(input.input_params.batch_id) !=


nit: can we dont use lock here ? just a suggestion

Kang-Meng requested review from RobbieLeung, liutongxuan, walsonyang and yq33victor October 31, 2025 04:39

Kang-Meng force-pushed the feat_async_copy branch 3 times, most recently from 17797ce to 034f86e Compare November 5, 2025 10:13

RobbieLeung reviewed Nov 5, 2025

View reviewed changes

Kang-Meng force-pushed the feat_async_copy branch 2 times, most recently from 7cd5bd4 to d4446aa Compare November 6, 2025 06:46

RobbieLeung previously approved these changes Nov 6, 2025

View reviewed changes

liutongxuan reviewed Nov 6, 2025

View reviewed changes

Kang-Meng dismissed RobbieLeung’s stale review via 844f089 November 6, 2025 15:42

Kang-Meng force-pushed the feat_async_copy branch 3 times, most recently from 64f071c to eee6a80 Compare November 7, 2025 09:31

Kang-Meng added 6 commits November 8, 2025 16:00

refactor: async device/host block copying, remove sync waits.

834a226

feat: upgrade Mooncake to v0.3.6.

746ea08

refactor: change host KV cache memory layout from layer-wise to block…

eae52e8

…-wise.

feat: add layer-wise KV cache H2D copy optimization.

daf3132

feat: implement batch prefetch from store.

a369020

feat: add dependency installation to setup script.

7489584

Kang-Meng force-pushed the feat_async_copy branch from eee6a80 to 90457da Compare November 8, 2025 09:38

feat: add page-aligned tensor creator for host KV cache.

b996af0

Kang-Meng force-pushed the feat_async_copy branch from 90457da to b996af0 Compare November 8, 2025 12:40

yq33victor reviewed Nov 9, 2025

View reviewed changes

feat: optimize kv cache load/offload. #306

Are you sure you want to change the base?

feat: optimize kv cache load/offload. #306

Uh oh!

Conversation

Kang-Meng commented Oct 31, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RobbieLeung left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants