ICTNet: Image Complexity-Aware Two-Branch Network with Enhanced Decoding for Real-time Segmentation

Highlights

Comparison of inference speed and accuracy for real-time models on the Cityscapes test set.

Towards Real-time Applications: ICTNet could be directly used for real-time applications, such as autonomous vehicles and medical imaging.
A Novel Image Complexity-Aware Two-branch Network: ICTNet integrates image complexity into the spatial branch and constructs a highly compact two-branch network with enhanced decoding to fully make use of image complexity guidance and progressively restore spatial details.
Faster and Accurate: ICTNet-S achieves 150.94 FPS with mIoU of 73.76 on the Cityscapes test set and 156.27 FPS with mIoU of 69.75% on the CamVid test set. Also, ICTNet-L achieves 129.54 FPS with a more accurate mIoU of 72.43%. Our models are trained from stretch, without any retraining.

Demos

A demo of the segmentation performance of our proposed ICTNets: Original video (left) and predictions of DABNet (middle-1) predictions of ICTNet-S (middle-2) and ICTNet-L (right)

Cityscapes demo video

Prerequisites

This implementation is based on PIDNet. Please refer to their repository for installation and dataset preparation. The inference speed is tested on single RTX 3090 using the method in PIDNet. No third-party acceleration lib is used, so you can try TensorRT or other approaches for faster speed.

Usage

0. Prepare the dataset(This section follows the PIDNet's instruction)

Download the Cityscapes and CamVid datasets and unzip them in data/cityscapes and data/camvid dirs.
Check if the paths contained in lists of data/list are correct for dataset images.

⚡ Instruction for preparation of CamVid data (remains discussion) ⚡

Download the images and annotations from Kaggle, where the resolution of images is 960x720 (original);
Unzip the data and put all the images and all the colored labels into data/camvid/images/ and data/camvid/labels, respectively;
Following the split of train, val and test sets used in SegNet-Tutorial, we have generated the dataset lists in data/list/camvid/;

1. Training

Replace the data root in config files with your_root_of_dataset.
Download the weight (icnet_ck.pth) of the Image Complexity Network from here and put it under models/checkpoint/.
For example, train the ICTNet-S on Cityscapes:

python tools/train.py --cfg configs/cityscapes/ictednet_small_city_train.yaml

Or train the ICTNet-L on Cityscapes using train and val sets simultaneously:

python tools/train.py --cfg configs/cityscapes/ictednet_large_city_trainval.yaml

2. Evaluation

Download the trained models for Cityscapes and CamVid from here and put them into trained_weights/cityscapes/ and trained_weights/camvid/ dirs, respectively.
For example, evaluate the ICTNet-S on Cityscapes val set:

python tools/eval.py --cfg configs/cityscapes/ictednet_small_city_train.yaml \
                          TEST.MODEL_FILE trained_weights/cityscapes/ictednet_small_city_train.pt

Or, evaluate the ICTNet-M on CamVid test set:

python tools/eval.py --cfg configs/camvid/ictednet_small_camvid.yaml \
                          TEST.MODEL_FILE trained_weights/camvid/ictednet_small_camvid.pt \
                          DATASET.TEST_SET list/camvid/test.lst

Generate the testing results of ICTNet-L on Cityscapes test set:

python tools/eval.py --cfg configs/cityscapes/ictednet_large_city_trainval.yaml \
                          TEST.MODEL_FILE trained_weights/cityscapes/ictednet_large_city_trainval.pt \
                          DATASET.TEST_SET list/cityscapes/test.lst

3. Speed Measurement

Measure the inference speed of ICTNet-S for Cityscapes:

python speed/ictednet_speed_test.py --model 'ictednet_s' --classnum 19 --size 1024 2048

4. Custom Inputs

Put your images in samples/ and then run the command below using Cityscapes pretrained ICTNet-L for image format of .png:

python tools/custom_ictednet_city.py --a 'ictednet_large' --p './trained_weights/cityscapes/ictednet_large_city_trainval.pt' --t '.png'

For Camvid:

python tools/custom_ictednet_cam.py --a 'ictednet_large' --p './trained_weights/camvid/ictednet_large_camvid.pt' --t '.png'

Acknowledgement

Our implementation is modified based on PIDNet, HRNet-Semantic-Segmentation, SANet, and SSSegmentation.
Thanks for their nice contribution.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
data/list		data/list
datasets		datasets
demo		demo
models		models
samples		samples
tools		tools
utils		utils
val_results		val_results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
train.log		train.log
train_camvid.log		train_camvid.log
train_new.log		train_new.log
train_no_pretrain.log		train_no_pretrain.log
train_single_no_pretrain.log		train_single_no_pretrain.log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ICTNet: Image Complexity-Aware Two-Branch Network with Enhanced Decoding for Real-time Segmentation

Highlights

Demos

Prerequisites

Usage

0. Prepare the dataset(This section follows the PIDNet's instruction)

⚡ Instruction for preparation of CamVid data (remains discussion) ⚡

1. Training

2. Evaluation

3. Speed Measurement

4. Custom Inputs

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ICTNet: Image Complexity-Aware Two-Branch Network with Enhanced Decoding for Real-time Segmentation

Highlights

Demos

Prerequisites

Usage

0. Prepare the dataset(This section follows the PIDNet's instruction)

⚡ Instruction for preparation of CamVid data (remains discussion) ⚡

1. Training

2. Evaluation

3. Speed Measurement

4. Custom Inputs

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages