The TEOChatlas dataset and external evaluation datasets are available for download here.
You can download all of the data using the following code:
from datasets import load_dataset
# Optionally specify a cache directory if you have limited space in your home directory
# Or if you want to place the data somewhere else.
cache_dir = None
# Optionally specify a split if you only want to download a subset of the data
# The splits are defined in the hugingface hub page for the dataset
split = None
dataset = load_dataset("jirvin16/TEOChatlas", split=split, cache_dir=cache_dir, trust_remote_code=True)This will download the data to the machine where the code is run. Running load_dataset again will not re-download the data, unless the cache directory is changed. The training code uses load_dataset to load the data.
Navigate to the Video-LLaVA-Pretrain-7B model on the Hugging Face model hub and download the mm_projector.bin file. This file contains the weights for the Video-LLaVA projector, which will be used to initialize the TEOChat projector.
You need to make the following changes to the training script in order to train TEOChat:
- Set the
--pretrain_mm_mlp_adapterto the path of themm_projector.binfile you downloaded in step 1. - Set the
--output_dirto the directory where you want to save the model checkpoints and logs. The prefix should bevideo-llava-7b-8bit-loraotherwise there may be issues evaluating the model. - (Optional) Set the
--cache_dirto the directory where you want to cache the pretrained models used for initialization (like Video-LLaVA). - (Optional) Set the
--data_cache_dirto the directory where you stored the TEOChatlas dataset if you specified a cache directory in the data preparation step.
sh scripts/train_teochat.shNOTE: If you downloaded the TEOChatlas dataset on or before March 25, 2025 at 5:00pm PT, you will likely get errors and/or unexpected results when evaluating TEOChat on the detection tasks. Please re-download the dataset to ensure that you have the latest version. Only the jsons have been updated, so you do not need to re-download the images.
To validate TEOChat on a dataset split, you can run the following script:
sh scripts/eval_teochat.sh <dataset_split> <model_path> <model_base> <cache_dir> <data_cache_dir>See eval.py for the full list of dataset splits.
For example, to evaluate TEOChat on UC Merced, you can run:
sh scripts/eval_teochat.sh ucm jirvin16/TEOChatassuming the model and data are stored in the default cache directories.
To evaluate a newly trained model on UC Merced, you can run:
sh scripts/eval_teochat.sh ucm /path/to/model LanguageBind/Video-LLaVA-7Bagain assuming the model and data are stored in the default cache directories.
You may need to make the following changes to the fine-tuning script:
- By default, the fine-tuning script will fine-tune TEOChat on the TEOChatlas dataset. To fine-tune on a different dataset, you will need to replace the
--data_nameand potentially also the--data_splitarguments in the script depending on the hugingface dataset you want to fine-tune on. The script only supports datasets that are available on the Hugging Face dataset hub, but you can modify the train.py file to support other datasets. - (Optional) Set the
--cache_dirto the directory where you want to cache the TEOChat model. - (Optional) Set the
--data_cache_dirto the directory where you stored the fine-tuning dataset if you specified a cache directory when preparing the data.
bash scripts/finetune_teochat.sh