Skip to content

Request for clarification on the cause of qualitative performance degradation #5

@Dev-ori

Description

@Dev-ori

Thank you for your excellent work. Your research has greatly inspired me.

As per your guidelines, I downloaded the model from Hugging Face and performed inference using inference.py with the Sampler located at /ATS/anomaly_scorer.pth. I specifically verified results for Arrest001, Burglary079, RoadAccidents131, and Stealing079 videos from the UCF_Crime dataset, as depicted in Supplementary Figure F of your HolmesVAU paper.

However, the following issues were identified:

  1. Anomaly descriptions are summarized into a single sentence, lacking sufficient context and accurate reasoning. Please refer to the attached examples below.

  2. Sampled frames are not densely captured within the anomaly GT regions as presented in your paper. For instance, in the Arrest001 video, the paper indicates at least seven sampled frames within the GT region, whereas the actual result from inference.py contains only three.

I look forward to your response addressing these two points.

used code for inference.py :

import os
import torch
from decord import VideoReader, cpu
import matplotlib.pyplot as plt
from holmesvau.holmesvau_utils import load_model, generate, show_smapled_video

mllm_path = '/home/annonymous/NewProjects/HolmesVAU/models'
sampler_path = '/home/annonymous/NewProjects/HolmesVAU/holmesvau/ATS/anomaly_scorer.pth'
device = torch.device('cuda:0')
model, tokenizer, generation_config, sampler = load_model(mllm_path, sampler_path, device)

video_path = "/media/annonymous/SSD02/UCF_Crime/Stealing/Stealing079_x264.mp4"
prompt = "Could you specify the anomaly events present in the video?"
pred, history, frame_indices, anomaly_score = generate(video_path, prompt, model, tokenizer, generation_config, sampler, select_frames=12, use_ATS=True)
print('\nUser:', prompt, '\nHolmesVAU:', pred)

result of examples:
Arrest001_x264.mp4 Arrest 1185 1485 -1 -1 hit 3
Frame Number: 2374
Anomaly-fouced Temporal Sampling...
Sampled frames: [80, 480, 688, 912, 1152, 1264, 1360, 1488, 1680, 1872, 2096, 2368]
HolmesVAU: The anomaly event involves a man in black entering the office and engaging in a physical altercation with another man in gray, which deviates from the normal and expected behavior of office workers, particularly in a professional setting.

Burglary079_x264.mp4 Burglary 7750 10710 -1 -1 hit 5
Frame Number: 14853
Anomaly-fouced Temporal Sampling...
Sampled frames: [80, 2688, 3664, 4688, 5920, 7888, 8448, 9168, 9760, 10496, 12848, 14816]
HolmesVAU: The anomaly event involves a group of people breaking into the car, which is a suspicious and illegal activity that deviates from normal and expected behavior in a typical parking lot or garage setting.

RoadAccidents131_x264.mp4 RoadAccidents 180 240 -1 -1 hit 0
Frame Number: 2729
Anomaly-fouced Temporal Sampling...
Sampled frames: [48, 320, 528, 752, 1024, 1264, 1472, 1664, 1872, 2096, 2384, 2704]
HolmesVAU: The anomaly event involves a car suddenly stopping and then driving away from the scene, which is an unusual and potentially hazardous behavior that deviates from normal traffic patterns

Stealing079_x264.mp4 Stealing 2550 3210 3510 4500 hit 2
Frame Number: 5846
Anomaly-fouced Temporal Sampling...
Sampled frames: [80, 1168, 2208, 2896, 3424, 3792, 4032, 4304, 4560, 4864, 5360, 5840]

User: Could you specify the anomaly events present in the video?
HolmesVAU: The anomaly event involves a man in white entering the gate and then leaving the gate, which is an unusual and suspicious behavior that may indicate a potential theft or unauthorized access

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions