llama-batched-bench help #15371

Master-Pr0grammer · 2025-08-17T12:43:58Z

Master-Pr0grammer
Aug 17, 2025

I just switched from ollama and I am new to llama.cpp so I'm sorry if this isa dumb question, but I wanted to set up an embedding model and after my slow experience with ollama, I wanted to set it up to be as fast as possible to handle RAG with long documents.

Anyways, I wanted to use llama-batched-bench to test different batch sizes, or parallel requests that would produce the most throughput, but I cant seem to figure out how this works. I found an example for llama-bench with different prompt input lengths, but llama-bench and llama-batched-bench are very different.

Also, while I am here, how does one set varying context lengths in llama-bench?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama-batched-bench help #15371

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

llama-batched-bench help #15371

Uh oh!

Master-Pr0grammer Aug 17, 2025

Replies: 0 comments

Master-Pr0grammer
Aug 17, 2025