Skip to content

Conversation

@peterchen-intel
Copy link
Collaborator

@peterchen-intel peterchen-intel commented Oct 18, 2025

Description

Document for Visual Token Pruning
Ticket: CVS-173220, CVS-170139

Implementation is in #2714
Doc build: https://github.com/openvinotoolkit/openvino.genai/actions/runs/18670224384?pr=2861

Signed-off-by: Chen, Peter <[email protected]>
@Copilot Copilot AI review requested due to automatic review settings October 18, 2025 12:16
@github-actions github-actions bot added the category: GH Pages Docs Github Pages documentation label Oct 18, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds a new documentation page describing the Visual Token Pruning (CDPruner) feature for VLMs, including conceptual overview, configuration parameters, and a sample usage snippet.

  • Introduces pruning concepts and workflow.
  • Documents new GenerationConfig fields (pruning_ratio, relevance_weight) and their effects.
  • Provides a benchmark script usage example for measuring performance impact.

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Signed-off-by: Chen, Peter <[email protected]>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@peterchen-intel
Copy link
Collaborator Author

@liangali

Co-authored-by: Copilot <[email protected]>
@Wovchena
Copy link
Collaborator

Build your docs at https://github.com/peterchen-intel/openvino.genai/actions/workflows/deploy_gh_pages.yml to see everything is fine. Add the link of the resulting docs to the PR description

@rkazants
Copy link
Collaborator

Should be merged only after #2714

Comment on lines 13 to 18
The visual token sequence extracted from the image encoder can be partitioned into:

* Retained Tokens: Subset judged most relevant by dominance scoring.
* Pruned Tokens: Dropped from future decoding (no longer participate in cross-attention or self-attention depending on architecture).

Pruning is controlled by a ratio (percentage of tokens to remove) and a relevance weight scaling that influences importance estimation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CDPruner operates on the sequence of visual token embeddings produced by the vision encoder before they are passed to the language model. Instead of forwarding all tokens, it selects a subset based on conditional diversity, combining token similarity and instruction relevance.

Token Partitioning The visual tokens are conceptually divided into:

  • Retained Tokens: A selected subset that provides diverse and instruction-relevant visual information.
  • Pruned Tokens: Tokens excluded from further processing because they contribute redundant or low-relevance information.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Co-authored-by: Liubov Talamanova  <[email protected]>
Signed-off-by: Chen, Peter <[email protected]>
1. Encode image producing N visual tokens (embeddings).
2. Compute pairwise token similarity and per-token relevance scores.
3. Relevance and similarity are combined into a conditional kernel. A greedy DPP-based MAP algorithm identifies the least important tokens to discard according to `pruning_ratio`, adjusting scores using `relevance_weight` to control the trade-off between diversity and relevance.
4. Optionally adjust scores using `relevance_weight` before selecting final kept set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step is already incorporated in the previous step.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@peterchen-intel
Copy link
Collaborator Author

peterchen-intel commented Oct 22, 2025

Build your docs at https://github.com/peterchen-intel/openvino.genai/actions/workflows/deploy_gh_pages.yml to see everything is fine. Add the link of the resulting docs to the PR description

@Wovchena Is following as expectation? I will remove the change in deploy_gh_pages.yml if it is OK.
https://github.com/openvinotoolkit/openvino.genai/actions/runs/18670224384?pr=2861 Download "github-pages" and view in local SimpleHTTPsServer.
image

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: GH Pages Docs Github Pages documentation category: GHA CI based on Github actions do_not_merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants