Skip to content

Comments

eagle3 cb impl with top-1 proposal#3055

Merged
Wovchena merged 29 commits intoopenvinotoolkit:masterfrom
songbell:bell/eagle_cb_top1_impl
Dec 22, 2025
Merged

eagle3 cb impl with top-1 proposal#3055
Wovchena merged 29 commits intoopenvinotoolkit:masterfrom
songbell:bell/eagle_cb_top1_impl

Conversation

@songbell
Copy link
Contributor

eagle3 CB impl
Tickets: CVS-173358
ref code: https://github.com/SafeAILab/EAGLE

Copilot AI review requested due to automatic review settings November 21, 2025 08:53
@github-actions github-actions bot added category: continuous batching Continuous batching category: LLM LLM pipeline (stateful, static) category: sampling Sampling / Decoding algorithms category: speculative decoding Speculative decoding category: GHA CI based on Github actions category: LLM samples GenAI LLM samples category: CPP API Changes in GenAI C++ public headers no-match-files category: GGUF GGUF file reader labels Nov 21, 2025
@songbell songbell changed the title Bell/eagle cb top1 impl eagle3 cb impl with top-1 proposal Nov 21, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements EAGLE3 speculative decoding with continuous batching support for improved inference performance. The changes add a new speculative decoding variant that uses hidden state passing between main and draft models for more efficient token generation.

Key Changes

  • Introduced Eagle3DecodingImpl for EAGLE3-specific speculative decoding logic
  • Extended model runner to support hidden state import/export for EAGLE3
  • Added test coverage for EAGLE3 speculative decoding scenarios

Reviewed Changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/python_tests/utils/hugging_face.py Adds eagle3 model detection and handles tokenizer conditionally
tests/python_tests/test_continuous_batching.py Adds EAGLE3 test cases and refactors test helper functions
tests/python_tests/samples/test_speculative_decoding_lm.py Extracts common test logic and adds EAGLE3 sample tests
tests/python_tests/samples/conftest.py Adds model configurations for EAGLE3 models
src/cpp/src/speculative_decoding/update_request_structs.hpp Extends GeneratedSequence to store hidden states
src/cpp/src/speculative_decoding/speculative_decoding_impl.hpp Refactors generate logic into template helper and exposes internal state
src/cpp/src/speculative_decoding/speculative_decoding_impl.cpp Extracts scheduler initialization and refactors generate using strategy pattern
src/cpp/src/speculative_decoding/speculative_decoding_eagle3_impl.hpp Defines EAGLE3 implementation with model transformations
src/cpp/src/speculative_decoding/speculative_decoding_eagle3_impl.cpp Implements EAGLE3 decoding with hidden state management
src/cpp/src/speculative_decoding/continuous_batching_for_speculative_decoding_impl.hpp Adds ContinuousBatchingForEagle3DecodingImpl class
src/cpp/src/speculative_decoding/continuous_batching_for_speculative_decoding_impl.cpp Implements hidden state handling in update_requests
src/cpp/src/sequence_group.hpp Adds hidden state storage and accessor methods to Sequence
src/cpp/src/sampling/sampler.hpp Adds draft-to-target mapping for EAGLE decoding
src/cpp/src/sampling/sampler.cpp Implements token index adjustment using draft2target mapping
src/cpp/src/llm/pipeline.cpp Adds apply_eagle_rt_info helper and draft model configuration
src/cpp/src/continuous_batching/pipeline.cpp Integrates EAGLE3 mode detection and instantiation
src/cpp/src/continuous_batching/model_runner.hpp Adds hidden state flag system and sequence mapping structures
src/cpp/include/openvino/genai/continuous_batching_pipeline.hpp Declares EAGLE3 implementation classes as friends
.github/workflows/windows.yml Excludes eagle3 tests from main suite and adds dedicated test job
.github/workflows/manylinux_2_28.yml Excludes eagle3 tests from main suite and adds dedicated test job
.github/workflows/linux.yml Excludes eagle3 tests from main suite and adds dedicated test job

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings November 27, 2025 02:24
@songbell songbell force-pushed the bell/eagle_cb_top1_impl branch from 1b08202 to 9d82589 Compare December 16, 2025 01:31
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

src/cpp/src/speculative_decoding/speculative_decoding_eagle3_impl.cpp:1

  • Corrected spelling of 'implementation' in URL comment.
// Copyright (C) 2023-2025 Intel Corporation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: fishbell <bell.song@intel.com>
Signed-off-by: fishbell <bell.song@intel.com>
Copilot AI review requested due to automatic review settings December 16, 2025 15:21
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings December 16, 2025 15:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

src/cpp/src/speculative_decoding/speculative_decoding_eagle3_impl.cpp:1

  • Corrected spelling of 'useage' to 'usage'.
// Copyright (C) 2023-2025 Intel Corporation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@sbalandi sbalandi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: fishbell <bell.song@intel.com>
Signed-off-by: fishbell <bell.song@intel.com>
Signed-off-by: fishbell <bell.song@intel.com>
@Wovchena Wovchena requested a review from popovaan December 17, 2025 12:39
Copy link
Contributor

@popovaan popovaan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Wovchena I double checked the PR and confirm that concatination of hidden layers along with matmul after it is now part of the main model as was suggested during disscussion.

Co-authored-by: Vladimir Zlobin <vladimir.zlobin@intel.com>
Copilot AI review requested due to automatic review settings December 18, 2025 00:38
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (1)

src/cpp/src/speculative_decoding/speculative_decoding_eagle3_impl.cpp:1

  • Corrected spelling of 'useage' to 'usage'.
// Copyright (C) 2023-2025 Intel Corporation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Wovchena Wovchena enabled auto-merge December 19, 2025 07:21
@Wovchena Wovchena added this pull request to the merge queue Dec 22, 2025
Merged via the queue into openvinotoolkit:master with commit aaa5612 Dec 22, 2025
369 of 391 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: continuous batching Continuous batching category: CPP API Changes in GenAI C++ public headers category: GGUF GGUF file reader category: GHA CI based on Github actions category: LLM samples GenAI LLM samples category: LLM LLM pipeline (stateful, static) category: sampling Sampling / Decoding algorithms category: speculative decoding Speculative decoding no-match-files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants