📹 Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models

Xuyang Liu^1,2, Yiyu Wang^1, Junpeng Ma³, Linfeng Zhang^1✉

¹ EPIC Lab, Shanghai Jiao Tong University, ²Sichuan University, ³Fudan University

⚡ The first token compression framework for VideoLLMs featuring dynamic frame budget allocation.

🔥 News

2026.02.21 🎊🎊 Our STC has been accepted by CVPR 2026! Code is available!
2026.01.22 ✅✅ We integrated three representative baselines FastV, VisionZip, and HoliTom into our codebase (see the qwen branch), with support for Qwen3-VL.
2026.01.08 ✅✅ Added support for Qwen2.5-Omni and Qwen3-Omni in the omni branch, with evaluation results. To date, VidCom² has been fully adapted to the Qwen-VL, Qwen-Omni, and LLaVA model series.
2025.12.30 ✅✅ Added support for Qwen2.5-VL and Qwen3-VL in the qwen branch, with evaluation results.
2025.12.02 🤗🤗 We release our latest work STC, the first plug-and-play inference acceleration framework for streaming video understanding! Code is available!
2025.08.21 🎉🎉 Our VidCom² has been accepted by EMNLP 2025 main conference!
2025.05.30 ⚡⚡ We are excited to release VidCom² implementation for Qwen2-VL!
2025.05.21 🤗🤗 We release VidCom², a plug-and-play inference acceleration method of VideoLLMs. Code is available!

🎯 Highlights

Model Adaptability: Compatible with most VideoLLMs (e.g., LLaVA, Qwen-VL, Qwen-Omni series).
Operator Compatibility: Works seamlessly with efficient operators like Flash Attention 2.
Strong Performance: Uses only 25% of tokens while maintaining 99.6% performance of LLaVA-OV.
High Efficiency: Cuts LLaVA-OV generation time by 70.8% and overall latency by 43.0%.

✨ Overview

TLDR: We present VidCom², a plug-and-play framework that dynamically compresses video tokens based on frame uniqueness, achieving state-of-the-art efficiency and performance across various VideoLLMs and benchmarks.

💥 Core Codes and Supported Models

The core implementation of our code is in token_compressor/vidcom2/vidcom2.py.

Model	Path
LLaVA-OneVision	`token_compressor/vidcom2/models/llava.py`
LLaVA-Video	`token_compressor/vidcom2/models/llava.py`
Qwen2-VL	`token_compressor/vidcom2/models/qwen2_vl.py`
Qwen2.5-VL	`token_compressor/vidcom2/models/qwen2_5_vl.py`
Qwen3-VL	`token_compressor/vidcom2/models/qwen3_vl.py`
Qwen2.5-Omni	`token_compressor/vidcom2/models/qwen2_5_omni.py`
Qwen3-Omni	`token_compressor/vidcom2/models/qwen3_omni.py`

Minimal Integration Snippets (click to expand)

These changes are implemented in the lmms-eval model wrappers:

LLaVA-OneVision (lmms-eval/lmms_eval/models/llava_onevision.py)

import os
import types
from token_compressor.vidcom2.models.llava import cus_prepare_inputs_labels_for_multimodal

if os.getenv("COMPRESSOR") == "vidcom2":
    self.model.prepare_inputs_labels_for_multimodal = types.MethodType(
        cus_prepare_inputs_labels_for_multimodal, self.model
    )
    eval_logger.info("[VidCom2] Successfully integrated VidCom2 with LLaVA-OneVision-7B.")

Qwen3-VL (lmms-eval/lmms_eval/models/qwen3_vl.py)

import os
import types
from token_compressor.vidcom2.models.qwen3_vl import Qwen3VLModel_forward

if os.getenv("COMPRESSOR") == "vidcom2":
    self._model.model.forward = types.MethodType(Qwen3VLModel_forward, self._model.model)
    eval_logger.success("[VidCom2] Successfully integrated VidCom2 with Qwen3-VL.")

Env knobs

Enable compression: COMPRESSOR=vidcom2
Set retention ratio: R_RATIO=0.25 (default 0.25)

🛠 Preparation

Clone this repository：

git clone https://github.com/xuyang-liu16/VidCom2.git
cd VidCom2

Environment Setup and Preparation:

conda create -n VidCom2 python=3.10 -y
conda activate VidCom2
pip install --upgrade pip  # Enable PEP 660 support.
pip install -e ".[train]"

Install lmms-eval:

If you want to measure the latency and GPU memory, please use the custom installation.

cd lmms-eval
pip install -e .

Or you can also use the official installation.

pip install git+https://github.com/EvolvingLMMs-Lab/lmms-eval.git

🚀 Performance Evaluation

We utilize the lmms-eval toolkit for model evaluation.

Branch Note: The main branch only supports LLaVA series inference. To run Qwen models, please switch to the qwen branch.

💡 Configuration Notes:

VidCom² Compression: Enable by prepending COMPRESSOR=vidcom2 to the command.

Retention Ratio: Setting by prepending R_RATIO to the command. The default retention ratio is set to 0.25.

Flash Attention: While optional, we strongly recommend enabling Flash Attention 2 to replicate the efficiency results reported in our paper.

Below are the evaluation scripts for supported models:

To evaluate LLaVA-OneVision-7B with VidCom², you can use:

COMPRESSOR=vidcom2 R_RATIO=0.25 accelerate launch --num_processes=8 \
  -m lmms_eval \
  --model llava_onevision \
  --model_args pretrained=lmms-lab/llava-onevision-qwen2-7b-ov,conv_template=qwen_1_5,model_name=llava_qwen,attn_implementation=flash_attention_2 \
  --tasks videomme,mlvu_dev,longvideobench_val_v,mvbench \
  --batch_size 1 \
  --log_samples \
  --log_samples_suffix llava_onevision \
  --output_path ./logs/

To evaluate LLaVA-Video-7B with VidCom², you can use:

COMPRESSOR=vidcom2 R_RATIO=0.25 accelerate launch --num_processes=8 \
  -m lmms_eval \
  --model llava_vid \
  --model_args pretrained=lmms-lab/LLaVA-Video-7B-Qwen2,conv_template=qwen_1_5,max_frames_num=64,mm_spatial_pool_mode=average,attn_implementation=flash_attention_2 \
  --tasks videomme,mlvu_dev,longvideobench_val_v,mvbench \
  --batch_size 1 \
  --log_samples \
  --log_samples_suffix llava_vid \
  --output_path ./logs/

⚡ Efficiency Analysis

Example format for LLaVA-OV-7B with VidCom² (R_RATIO=0.25) on 8*H100 GPUs:

Metric	Value
LLM_time_s	96.264
Total_time_s	560.816
Peak_mem_MB	19057.5

📌 Citation

Please consider citing our paper in your publications, if our findings help your research.

@article{liu2025vidcom2,
  title={Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models},
  author={Liu, Xuyang and Wang, Yiyu and Ma, Junpeng and Zhang, Linfeng},
  journal={arXiv preprint arXiv:2505.14454},
  year={2025}
}

👍 Acknowledgment

We extend our gratitude to the open-source efforts of LLaVA-OneVision and Qwen2-VL.

📩 Contact

For any question about our paper or code, please email liuxuyang@stu.scu.edu.cn or ustywan8@ljmu.ac.uk.

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
images		images
llava		llava
lmms-eval		lmms-eval
playground		playground
scripts		scripts
token_compressor		token_compressor
trl		trl
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cog.yaml		cog.yaml
predict.py		predict.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📹 Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models

Xuyang Liu^1,2, Yiyu Wang^1, Junpeng Ma³, Linfeng Zhang^1✉

¹ EPIC Lab, Shanghai Jiao Tong University, ²Sichuan University, ³Fudan University

🔥 News

🎯 Highlights

✨ Overview

💥 Core Codes and Supported Models

🛠 Preparation

🚀 Performance Evaluation

⚡ Efficiency Analysis

📌 Citation

👍 Acknowledgment

📩 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📹 Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models

Xuyang Liu1,2*, Yiyu Wang1*, Junpeng Ma3, Linfeng Zhang1✉ 1 EPIC Lab, Shanghai Jiao Tong University, 2Sichuan University, 3Fudan University

🔥 News

🎯 Highlights

✨ Overview

💥 Core Codes and Supported Models

🛠 Preparation

🚀 Performance Evaluation

⚡ Efficiency Analysis

📌 Citation

👍 Acknowledgment

📩 Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Xuyang Liu^1,2, Yiyu Wang^1, Junpeng Ma³, Linfeng Zhang^1✉

¹ EPIC Lab, Shanghai Jiao Tong University, ²Sichuan University, ³Fudan University

Packages