-
Notifications
You must be signed in to change notification settings - Fork 12.5k
feat: Add optional prompt processing progress streaming #14731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat: Add optional prompt processing progress streaming #14731
Conversation
- Add include_prompt_progress parameter to slot_params (default: false) - Extend server_task_result_cmpl_partial with progress fields - Implement send_progress_response() function with 1% progress intervals - Add progress response in prompt processing loop - Update JSON response to include prompt_processing field when requested - Add comprehensive documentation to README.md - Ensure full backward compatibility with existing clients Closes ggml-org#14685
Is there a chance this could get approved? If it's not a welcome addition, I'll put it in the mmojo-server fork. Being able to display evaluating progress is a must for servers running on slow CPU, e.g. Raspberry Pi 5. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use a more compact send_progress
, instead of include_prompt_progress
. Otherwise seems a good change.
tools/server/server.cpp
Outdated
params.stream = json_value(data, "stream", false); | ||
params.cache_prompt = json_value(data, "cache_prompt", true); | ||
params.return_tokens = json_value(data, "return_tokens", false); | ||
params.include_prompt_progress = json_value(data, "include_prompt_progress", false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
params.include_prompt_progress = json_value(data, "include_prompt_progress", false); | |
params.send_progress = json_value(data, "send_progress", false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Georgi! And for all you do with llama.cpp. Nobody says thank you enough!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
earlier in the code, we already had a param called return_tokens
so maybe return_progress
is a better naming
There are multiple PRs already open for this particular feature, it's very hard for maintainers to keep track. It's better to look at the list of opening PR before working on a feature. |
At least one test case is also required for this feature, maybe with a long prompt and a small batch size so we can clear see the effect. |
Hold up on this a moment. I've been testing this implementation today.
This logic should be eliminated. We should send progress only when a batch is complete. Otherwise, a bunch of progress messages blast the client as the batch completes. Most of the time spent processing a batch is not spent in the steps that emit this progress. It's spent, for example, here, as show with
If I bypass the If some other PR has done this better, I'm happy to go try one. This PR almost gets the behavior perfect. |
This probably doesn't qualify as a test case, but the video shows this code, with the change to logic of when to send progress that I described above, is working as it should. 19K tokens, batch size 64, Raspberry Pi 5, generic CPU code, Gemma 1B. Mmojo.Progress.mp4-Brad |
am just fixing it now |
- Add return_progress parameter to slot_params (default: false) - Extend server_task_result_cmpl_partial with progress fields - Implement send_progress_response() function with batch completion logic - Add progress response in prompt processing loop - Update JSON response to include prompt_processing field when requested - Add comprehensive documentation to README.md - Add C++ test suite for progress feature validation - Ensure full backward compatibility with existing clients - Fix chat completions endpoint progress support Closes ggml-org#14685
tests/test-progress-feature.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
server test is tools/server/tests/unit/test_chat_completion.py
not ctest
…incrementally - Remove incremental progress sending logic to avoid 'blasting the client' - Send progress only when prompt processing is complete (100%) - Add comprehensive test case with long prompt and small batch size - Test shows clear progress from 2.3% to 99.9% with 45 progress responses - Verify progress disabled functionality works correctly - Fixes GitHub issue ggml-org#14685
Honestly by this point I'm spending more time reviewing this PR than just fixing it myself.. You clearly haven't even read the exiting code. |
|
57d841f
to
3969a8d
Compare
Closes #14685
Make sure to read the contributing guidelines before submitting a PR