help please?!!! #2067
Unanswered
NierWinter
asked this question in
Q&A
help please?!!!
#2067
Replies: 1 comment
-
I originally thought that the disk was only used for persistence, and that upon startup, the KV cache would be paged back into memory or GPU memory. Isn't that how it works? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
need some clarification. When using llama-cpp-python to start the server API with options --cache true --cache_type disk, I noticed that every time I query the model, it triggers

LlamaDiskCache.getitem. Does this mean that when using LlamaDiskCache in this way, the KV cache read/write completely bypasses GPU memory and system RAM, directly accessing the KV cache on disk? Wouldn't that be much slower?
Beta Was this translation helpful? Give feedback.
All reactions