Skip to content

Questions about Pipeline for Frame Captioning and Visaul Tokenization #10

@qzhb

Description

@qzhb

When I use the run_frame_captioning_and_visual_tokenization.sh to extract visual tokenization and frame captioning for my own dataset, I meet the following issue under run_video_CapFilt.py file:

File "/extract_frame_concepts/models/med.py", line 178, in forward
attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: The size of tensor a (12) must match the size of tensor b (36) at non-singleton dimension 0

Is this because I did something wrong?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions