examples : predicted output for text generation #14739

iamlemec · 2025-07-17T15:32:14Z

This adds an example that allows the user to specify predicted (expected) outputs that are then used as a draft to speed up generation. The most prominent use case for this would be for something like making changes to an existing code block. Currently OpenAI has a similar feature with Predicted Outputs.

Obviously this has a lot of overlap with speculative decoding and lookup decoding. I think the main difference is that this gives the user more direct control over the expected output. This will also try to pick the draft up again if there is a difference followed by a few consecutive token matches. So in that sense, it brings in some of the benefits of lookup decoding.

I added in some example scripts for testing code modification. These can compare predicted vs speculative and lookup. I also included a script that uses the Osmosis-Apply-1.7B model that is directly targeted towards code patching.

Copilot

Pull Request Overview

This PR adds a predicted output generation feature that allows users to provide expected/draft text to accelerate text generation. The implementation includes an indexing mechanism to find matches between generated tokens and the draft text, with recovery capabilities when predictions fail.

Implements a predicted output generation algorithm with draft text matching and recovery
Adds shell scripts for testing and comparing with speculative and lookup decoding approaches
Integrates the new predicted example into the existing build system and command-line argument parsing

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
examples/predicted/predicted.cpp	Core implementation of predicted output generation with draft text indexing and matching logic
examples/predicted/CMakeLists.txt	Build configuration for the predicted example executable
examples/predicted/*.sh	Shell scripts for testing predicted output against speculative and lookup decoding
examples/predicted/README.md	Documentation explaining the predicted output algorithm
examples/predicted/data/patch_code.txt	Sample test data for the predicted output feature
examples/CMakeLists.txt	Integration of predicted example into main build system
common/common.h	Addition of LLAMA_EXAMPLE_PREDICTED enum and draft text parameter
common/arg.cpp	Command-line argument parsing support for draft text and example integration

Comments suppressed due to low confidence (1)

examples/predicted/predicted.cpp:15

[nitpick] The typedef name 'draft_index_t' uses a '_t' suffix which is typically reserved for POSIX types. Consider using 'DraftIndex' or 'draft_index' instead.

typedef std::unordered_map<common_ngram, std::vector<int64_t>, common_ngram_hash_function> draft_index_t;

Copilot · 2025-07-24T10:51:20Z

examples/predicted/predicted.cpp

+
+    // eval the prompt
+    const int n_prompt = inp.size();
+    llama_decode(ctx, llama_batch_get_one(inp.data(), n_prompt - 1));


[nitpick] The expression 'n_prompt - 1' appears multiple times in the code. Consider storing this value in a named variable like 'n_prompt_tokens' for better readability and maintainability.

Suggested change

llama_decode(ctx, llama_batch_get_one(inp.data(), n_prompt - 1));

const int n_prompt_tokens = n_prompt - 1;

llama_decode(ctx, llama_batch_get_one(inp.data(), n_prompt_tokens));

Copilot · 2025-07-24T10:51:21Z

examples/predicted/predicted.cpp

+        // process the accepted tokens and update contexts
+        const std::string token_str = common_token_to_piece(ctx, id_last);
+        if (params.use_color && use_draft) {
+            LOG("\u001b[%dm%s\u001b[37m", (36 - 0 % 6), token_str.c_str());


The expression '(36 - 0 % 6)' evaluates to a constant value of 36. This appears to be a magic number that should be defined as a named constant or explained with a comment.

Suggested change

LOG("\u001b[%dm%s\u001b[37m", (36 - 0 % 6), token_str.c_str());

LOG("\u001b[%dm%s\u001b[37m", CYAN_COLOR_CODE, token_str.c_str());

Copilot · 2025-07-24T10:51:21Z

examples/predicted/predicted.cpp

+    const int draft_max = params.speculative.n_max;
+
+    // draft text to use for prediction
+    std::string draft_text = params.speculative.text;


The boolean parameters 'false, true' passed to common_tokenize are unclear. Consider adding comments to explain what these parameters control (add_special and parse_special).

Suggested change

std::string draft_text = params.speculative.text;

std::string draft_text = params.speculative.text;

// Tokenize the draft text:

// - `false` indicates that special tokens should not be added.

// - `true` indicates that special tokens should be parsed if present.

examples : predicted output for text generation

629e340

github-actions bot added the examples label Jul 17, 2025

CISC requested a review from Copilot July 24, 2025 10:50

Copilot AI reviewed Jul 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

examples : predicted output for text generation #14739

examples : predicted output for text generation #14739

Uh oh!

iamlemec commented Jul 17, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jul 24, 2025

Uh oh!

Copilot AI Jul 24, 2025

Uh oh!

Copilot AI Jul 24, 2025

Uh oh!

Uh oh!

	llama_decode(ctx, llama_batch_get_one(inp.data(), n_prompt - 1));
	const int n_prompt_tokens = n_prompt - 1;
	llama_decode(ctx, llama_batch_get_one(inp.data(), n_prompt_tokens));

	LOG("\u001b[%dm%s\u001b[37m", (36 - 0 % 6), token_str.c_str());
	LOG("\u001b[%dm%s\u001b[37m", CYAN_COLOR_CODE, token_str.c_str());

-    std::string draft_text = params.speculative.text;
+    std::string draft_text = params.speculative.text;
+    // Tokenize the draft text:
+    // - `false` indicates that special tokens should not be added.
+    // - `true` indicates that special tokens should be parsed if present.

examples : predicted output for text generation #14739

Are you sure you want to change the base?

examples : predicted output for text generation #14739

Uh oh!

Conversation

iamlemec commented Jul 17, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!