llama-batched-bench help #15371
Unanswered
Master-Pr0grammer
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I just switched from ollama and I am new to llama.cpp so I'm sorry if this isa dumb question, but I wanted to set up an embedding model and after my slow experience with ollama, I wanted to set it up to be as fast as possible to handle RAG with long documents.
Anyways, I wanted to use llama-batched-bench to test different batch sizes, or parallel requests that would produce the most throughput, but I cant seem to figure out how this works. I found an example for llama-bench with different prompt input lengths, but llama-bench and llama-batched-bench are very different.
Also, while I am here, how does one set varying context lengths in llama-bench?
Beta Was this translation helpful? Give feedback.
All reactions