Any way to pre-load the model before running llama-server? #15942

samsymbia · 2025-09-11T22:05:04Z

samsymbia
Sep 11, 2025

I would like a way to pre-load the model before running the llama-server. Today I start the server, poll the /health endpoint in a loop until I get a 200 OK response, but that not very effective for the way I use it within a farm of servers. I also want to be able to snapshot the memory once everything is ready but before the first inference is processed. Having a way to do it in two manual steps would be great for my use-case.

samteezy · 2025-09-12T11:19:39Z

samteezy
Sep 12, 2025

llama-swap has functionality that will do this for you.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Any way to pre-load the model before running llama-server? #15942

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Any way to pre-load the model before running llama-server? #15942

Uh oh!

samsymbia Sep 11, 2025

Replies: 1 comment

Uh oh!

samteezy Sep 12, 2025

samsymbia
Sep 11, 2025

samteezy
Sep 12, 2025