Skip to content

xuyang-liu16/VidCom2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

117 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“Ή Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models

Xuyang Liu1,2*, Yiyu Wang1*, Junpeng Ma3, Linfeng Zhang1βœ‰

1 EPIC Lab, Shanghai Jiao Tong University, 2Sichuan University, 3Fudan University

⚑ The first token compression framework for VideoLLMs featuring dynamic frame budget allocation.

arXiv EMNLP PR PR Stars

πŸ”₯ News

  • 2026.02.21 🎊🎊 Our STC has been accepted by CVPR 2026! Code is available!
  • 2026.01.22 βœ…βœ… We integrated three representative baselines FastV, VisionZip, and HoliTom into our codebase (see the qwen branch), with support for Qwen3-VL.
  • 2026.01.08 βœ…βœ… Added support for Qwen2.5-Omni and Qwen3-Omni in the omni branch, with evaluation results. To date, VidCom2 has been fully adapted to the Qwen-VL, Qwen-Omni, and LLaVA model series.
  • 2025.12.30 βœ…βœ… Added support for Qwen2.5-VL and Qwen3-VL in the qwen branch, with evaluation results.
  • 2025.12.02 πŸ€—πŸ€— We release our latest work STC, the first plug-and-play inference acceleration framework for streaming video understanding! Code is available!
  • 2025.08.21 πŸŽ‰πŸŽ‰ Our VidCom2 has been accepted by EMNLP 2025 main conference!
  • 2025.05.30 ⚑⚑ We are excited to release VidCom2 implementation for Qwen2-VL!
  • 2025.05.21 πŸ€—πŸ€— We release VidCom2, a plug-and-play inference acceleration method of VideoLLMs. Code is available!

🎯 Highlights

  • Model Adaptability: Compatible with most VideoLLMs (e.g., LLaVA, Qwen-VL, Qwen-Omni series).
  • Operator Compatibility: Works seamlessly with efficient operators like Flash Attention 2.
  • Strong Performance: Uses only 25% of tokens while maintaining 99.6% performance of LLaVA-OV.
  • High Efficiency: Cuts LLaVA-OV generation time by 70.8% and overall latency by 43.0%.

✨ Overview

TLDR: We present VidCom2, a plug-and-play framework that dynamically compresses video tokens based on frame uniqueness, achieving state-of-the-art efficiency and performance across various VideoLLMs and benchmarks.

πŸ’₯ Core Codes and Supported Models

The core implementation of our code is in token_compressor/vidcom2/vidcom2.py.

Model Path
LLaVA-OneVision token_compressor/vidcom2/models/llava.py
LLaVA-Video token_compressor/vidcom2/models/llava.py
Qwen2-VL token_compressor/vidcom2/models/qwen2_vl.py
Qwen2.5-VL token_compressor/vidcom2/models/qwen2_5_vl.py
Qwen3-VL token_compressor/vidcom2/models/qwen3_vl.py
Qwen2.5-Omni token_compressor/vidcom2/models/qwen2_5_omni.py
Qwen3-Omni token_compressor/vidcom2/models/qwen3_omni.py
Minimal Integration Snippets (click to expand)

These changes are implemented in the lmms-eval model wrappers:

LLaVA-OneVision (lmms-eval/lmms_eval/models/llava_onevision.py)

import os
import types
from token_compressor.vidcom2.models.llava import cus_prepare_inputs_labels_for_multimodal

if os.getenv("COMPRESSOR") == "vidcom2":
    self.model.prepare_inputs_labels_for_multimodal = types.MethodType(
        cus_prepare_inputs_labels_for_multimodal, self.model
    )
    eval_logger.info("[VidCom2] Successfully integrated VidCom2 with LLaVA-OneVision-7B.")

Qwen3-VL (lmms-eval/lmms_eval/models/qwen3_vl.py)

import os
import types
from token_compressor.vidcom2.models.qwen3_vl import Qwen3VLModel_forward

if os.getenv("COMPRESSOR") == "vidcom2":
    self._model.model.forward = types.MethodType(Qwen3VLModel_forward, self._model.model)
    eval_logger.success("[VidCom2] Successfully integrated VidCom2 with Qwen3-VL.")

Env knobs

  • Enable compression: COMPRESSOR=vidcom2
  • Set retention ratio: R_RATIO=0.25 (default 0.25)

πŸ›  Preparation

  1. Clone this repository:
git clone https://github.com/xuyang-liu16/VidCom2.git
cd VidCom2
  1. Environment Setup and Preparation:
conda create -n VidCom2 python=3.10 -y
conda activate VidCom2
pip install --upgrade pip  # Enable PEP 660 support.
pip install -e ".[train]"
  1. Install lmms-eval:

If you want to measure the latency and GPU memory, please use the custom installation.

cd lmms-eval
pip install -e .

Or you can also use the official installation.

pip install git+https://github.com/EvolvingLMMs-Lab/lmms-eval.git

πŸš€ Performance Evaluation

We utilize the lmms-eval toolkit for model evaluation.

Branch Note: The main branch only supports LLaVA series inference. To run Qwen models, please switch to the qwen branch.

πŸ’‘ Configuration Notes:

  • VidCom2 Compression: Enable by prepending COMPRESSOR=vidcom2 to the command.
  • Retention Ratio: Setting by prepending R_RATIO to the command. The default retention ratio is set to 0.25.
  • Flash Attention: While optional, we strongly recommend enabling Flash Attention 2 to replicate the efficiency results reported in our paper.

Below are the evaluation scripts for supported models:

To evaluate LLaVA-OneVision-7B with VidCom2, you can use:

COMPRESSOR=vidcom2 R_RATIO=0.25 accelerate launch --num_processes=8 \
  -m lmms_eval \
  --model llava_onevision \
  --model_args pretrained=lmms-lab/llava-onevision-qwen2-7b-ov,conv_template=qwen_1_5,model_name=llava_qwen,attn_implementation=flash_attention_2 \
  --tasks videomme,mlvu_dev,longvideobench_val_v,mvbench \
  --batch_size 1 \
  --log_samples \
  --log_samples_suffix llava_onevision \
  --output_path ./logs/

To evaluate LLaVA-Video-7B with VidCom2, you can use:

COMPRESSOR=vidcom2 R_RATIO=0.25 accelerate launch --num_processes=8 \
  -m lmms_eval \
  --model llava_vid \
  --model_args pretrained=lmms-lab/LLaVA-Video-7B-Qwen2,conv_template=qwen_1_5,max_frames_num=64,mm_spatial_pool_mode=average,attn_implementation=flash_attention_2 \
  --tasks videomme,mlvu_dev,longvideobench_val_v,mvbench \
  --batch_size 1 \
  --log_samples \
  --log_samples_suffix llava_vid \
  --output_path ./logs/

⚑ Efficiency Analysis

Example format for LLaVA-OV-7B with VidCom2 (R_RATIO=0.25) on 8*H100 GPUs:

Metric Value
LLM_time_s 96.264
Total_time_s 560.816
Peak_mem_MB 19057.5

πŸ“Œ Citation

Please consider citing our paper in your publications, if our findings help your research.

@article{liu2025vidcom2,
  title={Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models},
  author={Liu, Xuyang and Wang, Yiyu and Ma, Junpeng and Zhang, Linfeng},
  journal={arXiv preprint arXiv:2505.14454},
  year={2025}
}

πŸ‘ Acknowledgment

We extend our gratitude to the open-source efforts of LLaVA-OneVision and Qwen2-VL.

πŸ“© Contact

For any question about our paper or code, please email liuxuyang@stu.scu.edu.cn or ustywan8@ljmu.ac.uk.

About

[EMNLP 2025 Main] Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors