-
Notifications
You must be signed in to change notification settings - Fork 12.7k
examples : predicted output for text generation #14739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a predicted output generation feature that allows users to provide expected/draft text to accelerate text generation. The implementation includes an indexing mechanism to find matches between generated tokens and the draft text, with recovery capabilities when predictions fail.
- Implements a predicted output generation algorithm with draft text matching and recovery
- Adds shell scripts for testing and comparing with speculative and lookup decoding approaches
- Integrates the new predicted example into the existing build system and command-line argument parsing
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
examples/predicted/predicted.cpp | Core implementation of predicted output generation with draft text indexing and matching logic |
examples/predicted/CMakeLists.txt | Build configuration for the predicted example executable |
examples/predicted/*.sh | Shell scripts for testing predicted output against speculative and lookup decoding |
examples/predicted/README.md | Documentation explaining the predicted output algorithm |
examples/predicted/data/patch_code.txt | Sample test data for the predicted output feature |
examples/CMakeLists.txt | Integration of predicted example into main build system |
common/common.h | Addition of LLAMA_EXAMPLE_PREDICTED enum and draft text parameter |
common/arg.cpp | Command-line argument parsing support for draft text and example integration |
Comments suppressed due to low confidence (1)
examples/predicted/predicted.cpp:15
- [nitpick] The typedef name 'draft_index_t' uses a '_t' suffix which is typically reserved for POSIX types. Consider using 'DraftIndex' or 'draft_index' instead.
typedef std::unordered_map<common_ngram, std::vector<int64_t>, common_ngram_hash_function> draft_index_t;
|
||
// eval the prompt | ||
const int n_prompt = inp.size(); | ||
llama_decode(ctx, llama_batch_get_one(inp.data(), n_prompt - 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The expression 'n_prompt - 1' appears multiple times in the code. Consider storing this value in a named variable like 'n_prompt_tokens' for better readability and maintainability.
llama_decode(ctx, llama_batch_get_one(inp.data(), n_prompt - 1)); | |
const int n_prompt_tokens = n_prompt - 1; | |
llama_decode(ctx, llama_batch_get_one(inp.data(), n_prompt_tokens)); |
Copilot uses AI. Check for mistakes.
// process the accepted tokens and update contexts | ||
const std::string token_str = common_token_to_piece(ctx, id_last); | ||
if (params.use_color && use_draft) { | ||
LOG("\u001b[%dm%s\u001b[37m", (36 - 0 % 6), token_str.c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expression '(36 - 0 % 6)' evaluates to a constant value of 36. This appears to be a magic number that should be defined as a named constant or explained with a comment.
LOG("\u001b[%dm%s\u001b[37m", (36 - 0 % 6), token_str.c_str()); | |
LOG("\u001b[%dm%s\u001b[37m", CYAN_COLOR_CODE, token_str.c_str()); |
Copilot uses AI. Check for mistakes.
const int draft_max = params.speculative.n_max; | ||
|
||
// draft text to use for prediction | ||
std::string draft_text = params.speculative.text; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The boolean parameters 'false, true' passed to common_tokenize are unclear. Consider adding comments to explain what these parameters control (add_special and parse_special).
std::string draft_text = params.speculative.text; | |
std::string draft_text = params.speculative.text; | |
// Tokenize the draft text: | |
// - `false` indicates that special tokens should not be added. | |
// - `true` indicates that special tokens should be parsed if present. |
Copilot uses AI. Check for mistakes.
This adds an example that allows the user to specify predicted (expected) outputs that are then used as a draft to speed up generation. The most prominent use case for this would be for something like making changes to an existing code block. Currently OpenAI has a similar feature with Predicted Outputs.
Obviously this has a lot of overlap with speculative decoding and lookup decoding. I think the main difference is that this gives the user more direct control over the expected output. This will also try to pick the draft up again if there is a difference followed by a few consecutive token matches. So in that sense, it brings in some of the benefits of lookup decoding.
I added in some example scripts for testing code modification. These can compare predicted vs speculative and lookup. I also included a script that uses the Osmosis-Apply-1.7B model that is directly targeted towards code patching.