-
Couldn't load subscription status.
- Fork 291
[Doc] Visual Token Pruning #2861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[Doc] Visual Token Pruning #2861
Conversation
Signed-off-by: Chen, Peter <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds a new documentation page describing the Visual Token Pruning (CDPruner) feature for VLMs, including conceptual overview, configuration parameters, and a sample usage snippet.
- Introduces pruning concepts and workflow.
- Documents new GenerationConfig fields (pruning_ratio, relevance_weight) and their effects.
- Provides a benchmark script usage example for measuring performance impact.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
site/docs/concepts/optimization-techniques/visual-token-pruning.md
Outdated
Show resolved
Hide resolved
site/docs/concepts/optimization-techniques/visual-token-pruning.md
Outdated
Show resolved
Hide resolved
site/docs/concepts/optimization-techniques/visual-token-pruning.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot <[email protected]>
site/docs/concepts/optimization-techniques/visual-token-pruning.md
Outdated
Show resolved
Hide resolved
site/docs/concepts/optimization-techniques/visual-token-pruning.md
Outdated
Show resolved
Hide resolved
Signed-off-by: Chen, Peter <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
site/docs/concepts/optimization-techniques/visual-token-pruning.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot <[email protected]>
|
Build your docs at https://github.com/peterchen-intel/openvino.genai/actions/workflows/deploy_gh_pages.yml to see everything is fine. Add the link of the resulting docs to the PR description |
|
Should be merged only after #2714 |
Signed-off-by: Chen, Peter <[email protected]>
Signed-off-by: Chen, Peter <[email protected]>
site/docs/concepts/optimization-techniques/visual-token-pruning.md
Outdated
Show resolved
Hide resolved
site/docs/concepts/optimization-techniques/visual-token-pruning.md
Outdated
Show resolved
Hide resolved
| The visual token sequence extracted from the image encoder can be partitioned into: | ||
|
|
||
| * Retained Tokens: Subset judged most relevant by dominance scoring. | ||
| * Pruned Tokens: Dropped from future decoding (no longer participate in cross-attention or self-attention depending on architecture). | ||
|
|
||
| Pruning is controlled by a ratio (percentage of tokens to remove) and a relevance weight scaling that influences importance estimation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CDPruner operates on the sequence of visual token embeddings produced by the vision encoder before they are passed to the language model. Instead of forwarding all tokens, it selects a subset based on conditional diversity, combining token similarity and instruction relevance.
Token Partitioning The visual tokens are conceptually divided into:
- Retained Tokens: A selected subset that provides diverse and instruction-relevant visual information.
- Pruned Tokens: Tokens excluded from further processing because they contribute redundant or low-relevance information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
site/docs/concepts/optimization-techniques/visual-token-pruning.md
Outdated
Show resolved
Hide resolved
site/docs/concepts/optimization-techniques/visual-token-pruning.md
Outdated
Show resolved
Hide resolved
site/docs/concepts/optimization-techniques/visual-token-pruning.md
Outdated
Show resolved
Hide resolved
site/docs/concepts/optimization-techniques/visual-token-pruning.md
Outdated
Show resolved
Hide resolved
site/docs/concepts/optimization-techniques/visual-token-pruning.md
Outdated
Show resolved
Hide resolved
site/docs/concepts/optimization-techniques/visual-token-pruning.md
Outdated
Show resolved
Hide resolved
site/docs/concepts/optimization-techniques/visual-token-pruning.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Liubov Talamanova <[email protected]>
Signed-off-by: Chen, Peter <[email protected]>
| 1. Encode image producing N visual tokens (embeddings). | ||
| 2. Compute pairwise token similarity and per-token relevance scores. | ||
| 3. Relevance and similarity are combined into a conditional kernel. A greedy DPP-based MAP algorithm identifies the least important tokens to discard according to `pruning_ratio`, adjusting scores using `relevance_weight` to control the trade-off between diversity and relevance. | ||
| 4. Optionally adjust scores using `relevance_weight` before selecting final kept set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This step is already incorporated in the previous step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
site/docs/concepts/optimization-techniques/visual-token-pruning.md
Outdated
Show resolved
Hide resolved
@Wovchena Is following as expectation? I will remove the change in deploy_gh_pages.yml if it is OK. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
site/docs/concepts/optimization-techniques/visual-token-pruning.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot <[email protected]>

Description
Document for Visual Token Pruning
Ticket: CVS-173220, CVS-170139
Implementation is in #2714
Doc build: https://github.com/openvinotoolkit/openvino.genai/actions/runs/18670224384?pr=2861