help please？！！！ #2067

NierWinter · 2025-09-16T10:45:28Z

NierWinter
Sep 16, 2025

need some clarification. When using llama-cpp-python to start the server API with options --cache true --cache_type disk, I noticed that every time I query the model, it triggers
LlamaDiskCache.getitem. Does this mean that when using LlamaDiskCache in this way, the KV cache read/write completely bypasses GPU memory and system RAM, directly accessing the KV cache on disk? Wouldn't that be much slower?

NierWinter · 2025-09-16T10:47:54Z

NierWinter
Sep 16, 2025
Author

I originally thought that the disk was only used for persistence, and that upon startup, the KV cache would be paged back into memory or GPU memory. Isn't that how it works?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

help please？！！！ #2067

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

help please？！！！ #2067

Uh oh!

NierWinter Sep 16, 2025

Replies: 1 comment

Uh oh!

NierWinter Sep 16, 2025 Author

NierWinter
Sep 16, 2025

NierWinter
Sep 16, 2025
Author