Skip to content

Conversation

sunxiaoxia2022
Copy link

@sunxiaoxia2022 sunxiaoxia2022 commented Oct 10, 2025

Description

Add eagle3 pipeline

Ticket: CVS-170888

Checklist:

  • Tests have been updated or added to cover the new code
  • This patch fully addresses the ticket.
  • I have made corresponding changes to the documentation

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds eagle3 pipeline support for speculative decoding in the who_what_benchmark tool. The changes enable users to configure and use draft models for speculative decoding with various configuration options.

  • Added command-line arguments for speculative decoding configuration including draft model path, device, and eagle3 mode
  • Modified text generation functions to use a unified generation config object instead of individual parameters
  • Updated model loader to support draft model configuration and speculative decoding setup

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
wwb.py Added CLI arguments for speculative decoding and eagle3 mode, updated generation config handling
text_evaluator.py Modified generation function signatures to use generation config object
model_loaders.py Added draft model loading and configuration support for speculative decoding

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

tokenizer is not None and tokenizer.chat_template is not None and not args.omit_chat_template
)

gen_config = openvino_genai.GenerationConfig()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please, import openvino_genai and create GenerationConfig only if --genai option is set

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can create and set generation config once when you create the GenAI pipeline in model_loaders.py

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, Updated.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@MaximProshin
Copy link
Collaborator

@sunxiaoxia2022 , please share wwb Similarity numbers for eagle3 models from #2740

@sunxiaoxia2022
Copy link
Author

sunxiaoxia2022 commented Oct 21, 2025

@sunxiaoxia2022 , please share wwb Similarity numbers for eagle3 models from #2740

Hi @MaximProshin
Test platform: LNL Ultra 7 258V Win
Models:

  1. llama-3.1-8b-instruct:
    target model: meta-llama/Llama-3.1-8B-Instruct
    Eagle3 draft model: yuhuili/EAGLE3-LLaMA3.1-Instruct-8B
  2. qwen3-8b:
    target model: Qwen/Qwen3-8B
    Eagle3 draft model: Tengyunw/qwen3_8b_eagle3

The similarity numbers are as follows:

<style> </style>
model precision prompt base eagle pipeline
llama-3.1-8b-instruct INT4 short 0.935972 0.935842
llama-3.1-8b-instruct INT4 long 0.923789 0.918256
qwen3-8b INT4 short 0.935486 0.935193
qwen3-8b INT4 long 0.913537 0.914053

@MaximProshin MaximProshin self-requested a review October 21, 2025 06:20
@moslex moslex added this to the 2025.4 milestone Oct 21, 2025
@moslex moslex added the priority: high High piority label Oct 21, 2025
@apaniukov
Copy link
Contributor

@sunxiaoxia2022 , please share wwb Similarity numbers for eagle3 models from #2740

Hi @MaximProshin Test platform: LNL Ultra 7 258V Win Models:

  1. llama-3.1-8b-instruct:
    target model: meta-llama/Llama-3.1-8B-Instruct
    Eagle3 draft model: yuhuili/EAGLE3-LLaMA3.1-Instruct-8B
  2. qwen3-8b:
    target model: Qwen/Qwen3-8B
    Eagle3 draft model: Tengyunw/qwen3_8b_eagle3

The similarity numbers are as follows:

<style> </style>

model precision prompt base eagle pipeline
llama-3.1-8b-instruct INT4 short 0.935972 0.935842
llama-3.1-8b-instruct INT4 long 0.923789 0.918256
qwen3-8b INT4 short 0.935486 0.935193
qwen3-8b INT4 long 0.913537 0.914053

What are num-assistant-tokens and assistant-confidence-threshold?

@Wovchena
Copy link
Collaborator

Wovchena commented Oct 21, 2025

Why doesn't the similarity match exactly between base and eagle pipelines? I thought the expert model should be exactly the same

@sbalandi
Copy link
Contributor

sbalandi commented Oct 21, 2025

LGTM, as part with speculative decoding in wwb part, but still need clarification about Similarity numbers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants