integrated vlm code for benchmark for Eagle2 #3698

chohk88 · 2025-07-21T16:27:17Z

Description

Closing the previous pull request (#3652) due to rebase difficulties with the main branch. This new PR resubmits the same changes for the VLM benchmark framework—now cleanly rebased on the latest main branch—and incorporates all feedback from the original review.

Integrated VLM benchmark framework
- Currently supports Eagle2, Qwen 2.5-VL
- Planned support: Paligemma etc.
Added custom token-generation function** for multi-modal (MM) models

Type of change

Please delete options that are not relevant and/or add your own.

New feature (non-breaking change which adds functionality)

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

peri044 · 2025-08-06T20:15:00Z

Qwen model : command I used:
python run_vlm.py

Error:

File "/work/TensorRT/tools/llm/run_vlm.py", line 448, in <module>
    inputs = load_inputs(args, processor, DEVICE)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/work/TensorRT/tools/llm/run_vlm.py", line 188, in load_inputs
    from qwen_vl_utils import process_vision_info
ModuleNotFoundError: No module named 'qwen_vl_utils'

peri044 · 2025-08-06T20:17:17Z

When I tried Eagle2 model, it shows

Traceback (most recent call last):
  File "/work/TensorRT/tools/llm/run_vlm.py", line 443, in <module>
    model, processor, emb_layer = load_model(args.model, DEVICE, dtype)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/work/TensorRT/tools/llm/run_vlm.py", line 141, in load_model
    return _load_eagle2(device, torch_dtype)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/work/TensorRT/tools/llm/run_vlm.py", line 101, in _load_eagle2
    AutoModel.from_pretrained(
  File "/root/.pyenv/versions/3.11.13/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/3.11.13/lib/python3.11/site-packages/transformers/modeling_utils.py", line 279, in _wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/3.11.13/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4336, in from_pretrained
    config = cls._autoset_attn_implementation(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/3.11.13/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2109, in _autoset_attn_implementation
    cls._check_and_enable_flash_attn_2(
  File "/root/.pyenv/versions/3.11.13/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2252, in _check_and_enable_flash_attn_2
    raise ImportError(f"{preface} the package flash_attn seems to be not installed. {install_message}")
ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.
root@45fb01c53ae9:/work/TensorRT/tools/llm# python

peri044

Please update docs and add these models to the list of supported models.

tools/llm/run_vlm.py

tools/llm/utils.py

chohk88

Thank you for your useful comments! I have addressed every comment!

tools/llm/run_vlm.py

chohk88 · 2025-08-11T14:59:45Z

Qwen model : command I used: python run_vlm.py

Error:

File "/work/TensorRT/tools/llm/run_vlm.py", line 448, in <module>
    inputs = load_inputs(args, processor, DEVICE)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/work/TensorRT/tools/llm/run_vlm.py", line 188, in load_inputs
    from qwen_vl_utils import process_vision_info
ModuleNotFoundError: No module named 'qwen_vl_utils'

I have added the installation instructions (for both FlashAttention2 and qwen_vl_utils) to the README and tutorial, and also included a helpful message to guide users on installation if the package is not found when running the script.

peri044 · 2025-08-12T22:49:14Z

tools/llm/README.md

+#### Vision Language Models: `run_vlm.py`
+
+```bash
+python run_vlm.py --model Qwen/Qwen2.5-VL-3B-Instruct --precision FP16 --num_tokens 128 --cache static_v1 --enable_pytorch_run --benchmark


let's use eagle model command here since that is fully optimized

peri044

Installing flash-attn 2.7.1+post4 works. let's mention this in the README under limitations. let's convey that we install this version but we actually don't use flash-attn and instead modify it to use sdpa

chohk88 added 3 commits July 17, 2025 13:10

integrated vlm code for benchmark

e4e09bb

add vision_model compile

9980c4c

Improve clarity of naming and comments

e5e63e5

chohk88 requested review from peri044 and zewenli98 July 21, 2025 16:27

chohk88 self-assigned this Jul 21, 2025

chohk88 added component: conversion Issues re: Conversion stage component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels Jul 21, 2025

meta-cla bot added the cla signed label Jul 21, 2025

github-actions bot removed component: conversion Issues re: Conversion stage component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels Jul 21, 2025

chohk88 added 2 commits July 24, 2025 13:12

support qwen2.5_vl with cache

5d98dc4

fix: align ISL/OSL with arguments and remove padding in language model

cfe1b23

peri044 requested changes Aug 6, 2025

View reviewed changes

chohk88 added 3 commits August 8, 2025 15:39

Improve usability and visibility of arguments, README, and tutorial

7bb5efd

refactoring utils for vision inputs and timings

da438a8

chore: slicing ouput token

7cef8b2

github-actions bot added the documentation Improvements or additions to documentation label Aug 11, 2025

This comment was marked as resolved.

Sign in to view

chohk88 commented Aug 11, 2025

View reviewed changes

chore: minor linting

3a58d2b

peri044 reviewed Aug 14, 2025

View reviewed changes

peri044 requested changes Aug 14, 2025

View reviewed changes

integrated vlm code for benchmark for Eagle2 #3698

Are you sure you want to change the base?

integrated vlm code for benchmark for Eagle2 #3698

Conversation

chohk88 commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist:

Uh oh!

peri044 commented Aug 6, 2025

Uh oh!

peri044 commented Aug 6, 2025

Uh oh!

peri044 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

chohk88 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chohk88 commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

peri044 Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

peri044 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chohk88 commented Jul 21, 2025 •

edited

Loading

chohk88 commented Aug 11, 2025 •

edited

Loading