STVFormer

STVFormer is a semantic segmentation framework built on top of OpenMMLab’s MMSegmentation. This repository extends MMSegmentation with a novel architecture—Spatial-Temporal Vision Transformer (STVFormer)—designed to enhance segmentation performance in video and sequential imagery by leveraging both spatial and temporal contextual cues.

This repo inherits MMSegmentation’s modular design, training pipelines, and evaluation tools, and integrates additional components specific to STVFormer, including custom backbones, temporal fusion modules, and dataset loaders optimized for sequential data.

Installation

Step1: Installation of mmsegmentation version. Forked from version mmsegmentation 1.2.2

Please refer to get_started.md for installation and dataset_prepare.md for dataset preparation.

Step2: Preparation of datasets

Cityscapes

Download the cityscapes dataset. Once you have logged in, your will have your username and password. For additional helpers and extended utilities, see:

city-scapes-script by cemsaz: A collection of scripts for automated downloading, extraction, and organization of the Cityscapes dataset. Useful for simplifying dataset preparation pipelines.

Cityscapes Images and Fine Annotations

The Cityscapes dataset contains the images and their fine annotations. The images are in the leftImg8bit folder, and the annotations are in the gtFine folder.

Cityscapes Coarse Annotations

The Cityscapes dataset also contains the coarse annotations. The images are in the leftImg8bit_sequence folder, and the annotations are in the gtCoarse folder. The coarse annotations are used for training the STVFormer model.

Cityscapes DemoVideo

The demoVideo is in the leftImg8bit/demoVideo folder. Additionally, for each cityscapes training images, there are 30 frames of video from which the training images are sampled.

Cityscapes Sequences

The training image with the coarse annotations is always the 19th frame sampled from the sequence of 30 frames. The sequence of 30 frames is in the leftImg8bit_sequence folder. The sequence of 30 frames is used for training the STVFormer model.

BDD100k

TODO

Training with STVFormer

python tools/train.py configs/stvformer/stvformer_cityscapes.py --work-dir work_dirs/stvformer_cityscapes

Inference with STVFormer

For inference, we either use weights and biases or the

python tools/STV_Inference.py

Contributing

We try to build upon MMSegmentation. Please refer to CONTRIBUTING.md for the contributing guideline.

Name		Name	Last commit message	Last commit date
Latest commit History 1,001 Commits
.circleci		.circleci
.dev_scripts		.dev_scripts
.github		.github
configs		configs
demo		demo
docker		docker
docs		docs
mmseg		mmseg
projects		projects
requirements		requirements
resources		resources
tests		tests
tools		tools
.gitignore		.gitignore
.owners.yml		.owners.yml
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
README_zh-CN.md		README_zh-CN.md
dataset-index.yml		dataset-index.yml
model-index.yml		model-index.yml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

STVFormer

Installation

Step1: Installation of mmsegmentation version. Forked from version mmsegmentation 1.2.2

Step2: Preparation of datasets

Cityscapes

Cityscapes Images and Fine Annotations

Cityscapes Coarse Annotations

Cityscapes DemoVideo

Cityscapes Sequences

BDD100k

Training with STVFormer

Inference with STVFormer

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

STVFormer

Installation

Step1: Installation of mmsegmentation version. Forked from version mmsegmentation 1.2.2

Step2: Preparation of datasets

Cityscapes

Cityscapes Images and Fine Annotations

Cityscapes Coarse Annotations

Cityscapes DemoVideo

Cityscapes Sequences

BDD100k

Training with STVFormer

Inference with STVFormer

Contributing

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages