STVFormer is a semantic segmentation framework built on top of OpenMMLab’s MMSegmentation. This repository extends MMSegmentation with a novel architecture—Spatial-Temporal Vision Transformer (STVFormer)—designed to enhance segmentation performance in video and sequential imagery by leveraging both spatial and temporal contextual cues.
This repo inherits MMSegmentation’s modular design, training pipelines, and evaluation tools, and integrates additional components specific to STVFormer, including custom backbones, temporal fusion modules, and dataset loaders optimized for sequential data.
Please refer to get_started.md for installation and dataset_prepare.md for dataset preparation.
Download the cityscapes dataset. Once you have logged in, your will have your username and password. For additional helpers and extended utilities, see:
- city-scapes-script by cemsaz: A collection of scripts for automated downloading, extraction, and organization of the Cityscapes dataset. Useful for simplifying dataset preparation pipelines.
The Cityscapes dataset contains the images and their fine annotations. The images are in the leftImg8bit folder,
and the annotations are in the gtFine folder.
The Cityscapes dataset also contains the coarse annotations. The images are in the leftImg8bit_sequence folder,
and the annotations are in the gtCoarse folder. The coarse annotations are used for training the STVFormer model.
The demoVideo is in the leftImg8bit/demoVideo folder. Additionally,
for each cityscapes training images, there are 30 frames of video from which the training images are sampled.
The training image with the coarse annotations is always the 19th frame sampled from the sequence of 30 frames.
The sequence of 30 frames is in the leftImg8bit_sequence folder. The sequence of 30 frames is used for training
the STVFormer model.
TODO
python tools/train.py configs/stvformer/stvformer_cityscapes.py --work-dir work_dirs/stvformer_cityscapesFor inference, we either use weights and biases or the
python tools/STV_Inference.pyWe try to build upon MMSegmentation. Please refer to CONTRIBUTING.md for the contributing guideline.
