Classification/ZeroshotClassifcation: - [ ] [mteb/kinetics-400](https://huggingface.co/datasets/mteb/kinetics-400) - [ ] [mteb/HMDB51](https://huggingface.co/datasets/mteb/HMDB51) - [ ] [mteb/Breakfast](https://huggingface.co/datasets/mteb/Breakfast) - [ ] [mteb/kinetics-700-2020](https://huggingface.co/datasets/mteb/kinetics-700-2020) - [ ] [mteb/UCF101-51VA](https://huggingface.co/datasets/mteb/UCF101-51VA) - [ ] [mteb/SomethingSomethingV2](https://huggingface.co/datasets/mteb/SomethingSomethingV2) - [ ] [mteb/kinetics-600](https://huggingface.co/datasets/mteb/kinetics-600) - [ ] [mteb/VGGSound](https://huggingface.co/datasets/mteb/VGGSound) - [ ] [mteb/AVE-Dataset](https://huggingface.co/datasets/mteb/AVE-Dataset) - [ ] [mteb/Human-Animal-Cartoon](https://huggingface.co/datasets/mteb/Human-Animal-Cartoon) - [ ] [mteb/RAVDESS_AV](https://huggingface.co/datasets/mteb/RAVDESS_AV) - [ ] [mteb/MUSIC-AVQA_cls-preprocessed](https://huggingface.co/datasets/mteb/MUSIC-AVQA_cls-preprocessed) - [ ] [mteb/MELD](https://huggingface.co/datasets/mteb/MELD) - [ ] [mteb/AVMeme-Exam](https://huggingface.co/datasets/mteb/AVMeme-Exam) - [ ] [mteb/WorldSense_1min](https://huggingface.co/datasets/mteb/WorldSense_1min) Pair Classification: - [ ] [mteb/Human-Animal-Cartoon](https://huggingface.co/datasets/mteb/Human-Animal-Cartoon) - [ ] [mteb/AVE-Dataset](https://huggingface.co/datasets/mteb/AVE-Dataset) - [ ] [mteb/RAVDESS_AV](https://huggingface.co/datasets/mteb/RAVDESS_AV) - [ ] [mteb/MELD](https://huggingface.co/datasets/mteb/MELD) - [ ] [mteb/MUSIC-AVQA_cls-preprocessed](https://huggingface.co/datasets/mteb/MUSIC-AVQA_cls-preprocessed) Clustering: - [ ] [mteb/WorldSense_1min](https://huggingface.co/datasets/mteb/WorldSense_1min) - [ ] [mteb/AVE-Dataset](https://huggingface.co/datasets/mteb/AVE-Dataset) - [ ] [mteb/RAVDESS_AV](https://huggingface.co/datasets/mteb/RAVDESS_AV) - [ ] [mteb/MELD](https://huggingface.co/datasets/mteb/MELD) - [ ] [mteb/MUSIC-AVQA_cls-preprocessed](https://huggingface.co/datasets/mteb/MUSIC-AVQA_cls-preprocessed) - [ ] VideoMME by domain? Retrieval: - [ ] [mteb/MSR-VTT](https://huggingface.co/datasets/mteb/MSR-VTT) Task Implemented. Pending: #4375 - [ ] [mteb/MSVD](https://huggingface.co/datasets/mteb/MSVD) - [ ] [mteb/DiDeMo](https://huggingface.co/datasets/mteb/DiDeMo) - [ ] [mteb/TUNA-Bench_1K](https://huggingface.co/datasets/mteb/TUNA-Bench_1K) - [ ] [mteb/ActivityNet_Captions_val2](https://huggingface.co/datasets/mteb/ActivityNet_Captions_val2) - [ ] [mteb/YouCook2_val](https://huggingface.co/datasets/mteb/YouCook2_val) - [ ] [mteb/VATEX_test_1k](https://huggingface.co/datasets/mteb/VATEX_test_1k) - [ ] [mteb/Shot2Story20K_test](https://huggingface.co/datasets/mteb/Shot2Story20K_test) - [ ] [mteb/VGGSound_AV_RETRIEVAL](https://huggingface.co/datasets/mteb/VGGSound_AV_RETRIEVAL) - [ ] [mteb/VALOR-32K](https://huggingface.co/datasets/mteb/VALOR-32K) - [ ] [mteb/AudioCaps_AV](https://huggingface.co/datasets/mteb/AudioCaps_AV) - [ ] [mteb/panda-70m](https://huggingface.co/datasets/mteb/panda-70m) - [ ] [mteb/AVMeme-Exam](https://huggingface.co/datasets/mteb/AVMeme-Exam) Video Question Answering: - [ ] [mteb/worldqa](https://huggingface.co/datasets/mteb/worldqa) - [ ] [mteb/EgoSchema_subset](https://huggingface.co/datasets/mteb/EgoSchema_subset) - [ ] [mteb/NExT-QA](https://huggingface.co/datasets/mteb/NExT-QA) - [ ] [mteb/PerceptionTest_val](https://huggingface.co/datasets/mteb/PerceptionTest_val) - [ ] [mteb/star_bench_val](https://huggingface.co/datasets/mteb/star_bench_val) - [ ] [mteb/AV-SpeakerBench](https://huggingface.co/datasets/mteb/AV-SpeakerBench) - [ ] [mteb/WorldSense_1min](https://huggingface.co/datasets/mteb/WorldSense_1min) - [ ] [mteb/Daily-Omni](https://huggingface.co/datasets/mteb/Daily-Omni) - [ ] [mteb/Video-MME_short](https://huggingface.co/datasets/mteb/Video-MME_short) - [ ] [mteb/OmniVideoBench_subset](https://huggingface.co/datasets/mteb/OmniVideoBench_subset) - [ ] [mteb/AVQA_val](https://huggingface.co/datasets/mteb/AVQA_val) - [ ] [mteb/AVMeme-Exam](https://huggingface.co/datasets/mteb/AVMeme-Exam) - [ ] [mteb/MVBench](https://huggingface.co/datasets/mteb/MVBench) Models: - [ ] [ImageBind](https://github.com/facebookresearch/ImageBind) - [ ] [LanguageBind](https://huggingface.co/LanguageBind/LanguageBind_Video) - [ ] [LCO](https://huggingface.co/LCO-Embedding/LCO-Embedding-Omni-7B) - [ ] [PE-AV (Facebook)](https://huggingface.co/facebook/pe-av-large) https://github.com/embeddings-benchmark/mteb/issues/3797 - [ ] [Omni-Embed-Nemotron](https://huggingface.co/nvidia/omni-embed-nemotron-3b) - [ ] [Tevatron/OmniEmbed-v0.1](https://huggingface.co/Tevatron/OmniEmbed-v0.1) - [ ] [Qwen2.5-Omni](https://huggingface.co/Qwen/Qwen2.5-Omni-7B) - [ ] [e5-omni](https://huggingface.co/Haon-Chen/e5-omni-7B) - [ ] [nvidia/omnivinci](https://huggingface.co/nvidia/omnivinci) - [ ] [Seed-1.6-Embedding](https://seed1-6-embedding.github.io/) Requires GitHub repo cloning: - [ ] [ONE-PEACE](https://github.com/OFA-Sys/ONE-PEACE) - [ ] [OmniBind](https://github.com/zehanwang01/OmniBind)
Classification/ZeroshotClassifcation:
Pair Classification:
Clustering:
Retrieval:
Video Question Answering:
Models:
PE-AV#3797Requires GitHub repo cloning: