Skip to content

feat(mem_cache): Add CLOCK second-chance eviction policy for radix KV cache#20125

Closed
ghost wants to merge 11 commits intomainfrom
unknown repository
Closed

feat(mem_cache): Add CLOCK second-chance eviction policy for radix KV cache#20125
ghost wants to merge 11 commits intomainfrom
unknown repository

Conversation

@ghost
Copy link

@ghost ghost commented Mar 8, 2026

Motivation

The existing RadixCache.evict() rebuilds a full heap from all evictable leaves on every call O(N log N) even when freeing a single token. Under high-throughput serving with thousands of cached prefixes this is a measurable hot-path cost. This PR adds a CLOCK second-chance eviction policy that amortizes eviction to O(1) per node while retaining LRU-quality cache hit rates.

Modifications

  • Added CLOCKStrategy class with second-chance reference-bit logic in evict_policy.py
  • Added referenced boolean flag to TreeNode, set it on every cache hit in _match_prefix_helper, registered clock in the strategy selector, and added second-chance re-queue logic in evict() in radix_cache.py
  • Added clock to RADIX_EVICTION_POLICY_CHOICES and updated help text in server_args.py
  • Added unit tests for priority ordering and RadixCache integration in test_clock_eviction.py

Accuracy Tests

No changes to model forward code or kernel logic. Eviction policy is additive and opt-in via --radix-eviction-policy clock. Default policy remains lru.

Benchmarking and Profiling

CLOCK reduces eviction overhead at scale by eliminating per-call heap rebuilds. Benchmarking under high-concurrency prefix-sharing workloads expected to show reduced scheduler latency spikes during KV cache pressure.

Checklist

  • Format your code according to the Format code with pre-commit.
  • Add unit tests according to the Run and add unit tests.
  • Update documentation according to Write documentations.
  • Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
  • Follow the SGLang code style guidance.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the Radix KV cache by adding a CLOCK second-chance eviction policy. This new policy aims to significantly reduce the computational overhead associated with cache eviction, particularly in high-concurrency scenarios. By optimizing the eviction process, the change is expected to improve overall scheduler latency and system responsiveness without compromising the quality of cache hit rates.

Highlights

  • New Eviction Policy: Introduced a new CLOCK (second-chance) eviction policy for the Radix KV cache, designed to improve performance under high-throughput workloads.
  • Performance Optimization: The CLOCK policy amortizes eviction costs to O(1) per node, eliminating the O(N log N) heap rebuilds of the existing LRU policy, while maintaining LRU-like cache hit rates.
  • Integration and Configuration: Integrated the CLOCK strategy into the RadixCache and made it selectable via the --radix-eviction-policy clock command-line argument.
  • Testing: Added comprehensive unit tests to verify the correctness and integration of the new CLOCK eviction strategy.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • python/sglang/srt/mem_cache/evict_policy.py
    • Introduced the CLOCKStrategy class, implementing the second-chance eviction logic.
    • Defined the get_priority method for CLOCKStrategy to use a referenced bit and last_access_time for eviction ordering.
  • python/sglang/srt/mem_cache/radix_cache.py
    • Imported the new CLOCKStrategy.
    • Added a referenced: bool attribute to TreeNode to support the CLOCK algorithm.
    • Updated the RadixCache constructor to initialize CLOCKStrategy when the 'clock' eviction policy is selected.
    • Modified the evict method to incorporate the CLOCK second-chance logic, checking and clearing the referenced bit before actual eviction.
    • Ensured the referenced flag is set to True for a TreeNode on every cache hit within _match_prefix_helper.
  • python/sglang/srt/server_args.py
    • Added 'clock' as a valid choice to RADIX_EVICTION_POLICY_CHOICES.
    • Updated the help text for the --radix-eviction-policy argument to describe the new 'clock' option and its benefits.
  • test/unit/mem_cache/test_clock_eviction.py
    • Created a new unit test file for the CLOCK eviction policy.
    • Added TestCLOCKStrategyPriority to verify the priority ordering logic of the CLOCKStrategy.
    • Included TestRadixCacheCLOCKIntegration to confirm that the 'clock' policy is correctly registered and that the referenced bit is set on cache hits.
Activity
  • Unit tests have been added for the new functionality.
  • The code adheres to the SGLang code style guidelines.
  • Other checklist items, such as documentation updates and benchmark results, are pending.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new 'clock' eviction policy for the radix KV cache, implementing a second-chance approximate-LRU strategy. The changes include adding a CLOCKStrategy, updating the RadixCache to use it, and adding corresponding server arguments and unit tests.

My review has identified a few areas for improvement:

  • There is a significant discrepancy between the claimed O(1) amortized performance of the CLOCK policy and the actual implementation, which retains the O(N log N) heap reconstruction on each eviction. The PR description and help text should be updated to accurately reflect the algorithm's complexity.
  • The unit tests for the new policy are good but lack coverage for the core eviction logic in RadixCache.evict. I've suggested adding a test to verify that referenced nodes are correctly given a second chance.

Overall, the feature is a valuable addition for providing LRU-like eviction behavior. Addressing the points above will improve the clarity and robustness of the implementation.

Comment on lines +3640 to +3641
"'clock' = Second-Chance (CLOCK) approximate-LRU with O(1) "
"amortized eviction cost."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The help text for the clock policy claims it has "O(1) amortized eviction cost". As noted in another comment, the current implementation in radix_cache.py does not achieve this, as it still rebuilds a heap on every eviction call.

To avoid misleading users, please update this help text to accurately reflect the algorithm's performance. For example, you could describe it as a "Second-Chance (CLOCK) approximate-LRU" policy without mentioning the O(1) complexity.

Suggested change
"'clock' = Second-Chance (CLOCK) approximate-LRU with O(1) "
"amortized eviction cost."
"'clock' = Second-Chance (CLOCK) approximate-LRU."

Comment on lines +37 to +69
class TestRadixCacheCLOCKIntegration:
def test_clock_policy_registered(self):
from sglang.srt.mem_cache.cache_init_params import CacheInitParams
from unittest.mock import MagicMock
mock_alloc = MagicMock()
mock_alloc.device = "cpu"
params = CacheInitParams(
disable=False,
req_to_token_pool=None,
token_to_kv_pool_allocator=mock_alloc,
page_size=1,
enable_kv_cache_events=False,
eviction_policy="clock",
)
cache = RadixCache(params)
assert isinstance(cache.eviction_strategy, CLOCKStrategy)

def test_referenced_bit_set_on_match(self):
import torch
from sglang.srt.mem_cache.base_prefix_cache import InsertParams, MatchPrefixParams
cache = RadixCache.create_simulated(disable=False, page_size=1)
key_ids = list(range(4))
value = torch.zeros(4, dtype=torch.int32)
cache.insert(InsertParams(key=key_ids, value=value))
cache.match_prefix(MatchPrefixParams(key=key_ids))

all_nodes = []
def _collect(node):
for child in node.children.values():
all_nodes.append(child)
_collect(child)
_collect(cache.root_node)
assert any(n.referenced for n in all_nodes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The tests for the CLOCK strategy are a good start, but they don't cover the core eviction logic within RadixCache.evict. Specifically, there's no test to verify that:

  1. An unreferenced node is evicted before a referenced one.
  2. A referenced node that is considered for eviction has its referenced bit cleared and is given a "second chance" (i.e., not evicted immediately).

Please consider adding a test case to TestRadixCacheCLOCKIntegration that simulates an eviction scenario to validate this behavior. This would make the tests for the new policy more robust.

For example, a test could:

  1. Set up a RadixCache with the clock policy.
  2. Insert two evictable nodes, A and B.
  3. Make node A referenced (A.referenced = True) and older (A.last_access_time is smaller), and node B unreferenced and newer.
  4. Call cache.evict() to evict one node's worth of tokens.
  5. Assert that node B was evicted and node A remains (and its referenced bit is now False).

@ghost ghost closed this Mar 8, 2026
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants