feat(mem_cache): Add CLOCK second-chance eviction policy for radix KV cache#20125
Conversation
…ation, hit marking, and second-chance evict loop
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the Radix KV cache by adding a CLOCK second-chance eviction policy. This new policy aims to significantly reduce the computational overhead associated with cache eviction, particularly in high-concurrency scenarios. By optimizing the eviction process, the change is expected to improve overall scheduler latency and system responsiveness without compromising the quality of cache hit rates. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new 'clock' eviction policy for the radix KV cache, implementing a second-chance approximate-LRU strategy. The changes include adding a CLOCKStrategy, updating the RadixCache to use it, and adding corresponding server arguments and unit tests.
My review has identified a few areas for improvement:
- There is a significant discrepancy between the claimed O(1) amortized performance of the CLOCK policy and the actual implementation, which retains the O(N log N) heap reconstruction on each eviction. The PR description and help text should be updated to accurately reflect the algorithm's complexity.
- The unit tests for the new policy are good but lack coverage for the core eviction logic in
RadixCache.evict. I've suggested adding a test to verify that referenced nodes are correctly given a second chance.
Overall, the feature is a valuable addition for providing LRU-like eviction behavior. Addressing the points above will improve the clarity and robustness of the implementation.
python/sglang/srt/server_args.py
Outdated
| "'clock' = Second-Chance (CLOCK) approximate-LRU with O(1) " | ||
| "amortized eviction cost." |
There was a problem hiding this comment.
The help text for the clock policy claims it has "O(1) amortized eviction cost". As noted in another comment, the current implementation in radix_cache.py does not achieve this, as it still rebuilds a heap on every eviction call.
To avoid misleading users, please update this help text to accurately reflect the algorithm's performance. For example, you could describe it as a "Second-Chance (CLOCK) approximate-LRU" policy without mentioning the O(1) complexity.
| "'clock' = Second-Chance (CLOCK) approximate-LRU with O(1) " | |
| "amortized eviction cost." | |
| "'clock' = Second-Chance (CLOCK) approximate-LRU." |
| class TestRadixCacheCLOCKIntegration: | ||
| def test_clock_policy_registered(self): | ||
| from sglang.srt.mem_cache.cache_init_params import CacheInitParams | ||
| from unittest.mock import MagicMock | ||
| mock_alloc = MagicMock() | ||
| mock_alloc.device = "cpu" | ||
| params = CacheInitParams( | ||
| disable=False, | ||
| req_to_token_pool=None, | ||
| token_to_kv_pool_allocator=mock_alloc, | ||
| page_size=1, | ||
| enable_kv_cache_events=False, | ||
| eviction_policy="clock", | ||
| ) | ||
| cache = RadixCache(params) | ||
| assert isinstance(cache.eviction_strategy, CLOCKStrategy) | ||
|
|
||
| def test_referenced_bit_set_on_match(self): | ||
| import torch | ||
| from sglang.srt.mem_cache.base_prefix_cache import InsertParams, MatchPrefixParams | ||
| cache = RadixCache.create_simulated(disable=False, page_size=1) | ||
| key_ids = list(range(4)) | ||
| value = torch.zeros(4, dtype=torch.int32) | ||
| cache.insert(InsertParams(key=key_ids, value=value)) | ||
| cache.match_prefix(MatchPrefixParams(key=key_ids)) | ||
|
|
||
| all_nodes = [] | ||
| def _collect(node): | ||
| for child in node.children.values(): | ||
| all_nodes.append(child) | ||
| _collect(child) | ||
| _collect(cache.root_node) | ||
| assert any(n.referenced for n in all_nodes) |
There was a problem hiding this comment.
The tests for the CLOCK strategy are a good start, but they don't cover the core eviction logic within RadixCache.evict. Specifically, there's no test to verify that:
- An unreferenced node is evicted before a referenced one.
- A referenced node that is considered for eviction has its
referencedbit cleared and is given a "second chance" (i.e., not evicted immediately).
Please consider adding a test case to TestRadixCacheCLOCKIntegration that simulates an eviction scenario to validate this behavior. This would make the tests for the new policy more robust.
For example, a test could:
- Set up a
RadixCachewith theclockpolicy. - Insert two evictable nodes, A and B.
- Make node A referenced (
A.referenced = True) and older (A.last_access_timeis smaller), and node B unreferenced and newer. - Call
cache.evict()to evict one node's worth of tokens. - Assert that node B was evicted and node A remains (and its
referencedbit is nowFalse).
…aim from clock help text
…n order test for CLOCK policy
…lank lines to pass lint
Motivation
The existing RadixCache.evict() rebuilds a full heap from all evictable leaves on every call O(N log N) even when freeing a single token. Under high-throughput serving with thousands of cached prefixes this is a measurable hot-path cost. This PR adds a CLOCK second-chance eviction policy that amortizes eviction to O(1) per node while retaining LRU-quality cache hit rates.
Modifications
Accuracy Tests
No changes to model forward code or kernel logic. Eviction policy is additive and opt-in via --radix-eviction-policy clock. Default policy remains lru.
Benchmarking and Profiling
CLOCK reduces eviction overhead at scale by eliminating per-call heap rebuilds. Benchmarking under high-concurrency prefix-sharing workloads expected to show reduced scheduler latency spikes during KV cache pressure.
Checklist