Replies: 1 comment
-
llama-swap has functionality that will do this for you. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I would like a way to pre-load the model before running the llama-server. Today I start the server, poll the /health endpoint in a loop until I get a 200 OK response, but that not very effective for the way I use it within a farm of servers. I also want to be able to snapshot the memory once everything is ready but before the first inference is processed. Having a way to do it in two manual steps would be great for my use-case.
Beta Was this translation helpful? Give feedback.
All reactions