Questions about Pipeline for Frame Captioning and Visaul Tokenization

When I use the run_frame_captioning_and_visual_tokenization.sh to extract visual tokenization and frame captioning for my own dataset,   I meet the following issue under run_video_CapFilt.py file: 

File "/extract_frame_concepts/models/med.py", line 178, in forward
    attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: The size of tensor a (12) must match the size of tensor b (36) at non-singleton dimension 0

Is this because I did something wrong?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about Pipeline for Frame Captioning and Visaul Tokenization #10

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Questions about Pipeline for Frame Captioning and Visaul Tokenization #10

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions