Comparison of inference speed and accuracy for real-time models on the Cityscapes test set.
- Towards Real-time Applications: ICTNet could be directly used for real-time applications, such as autonomous vehicles and medical imaging.
- A Novel Image Complexity-Aware Two-branch Network: ICTNet integrates image complexity into the spatial branch and constructs a highly compact two-branch network with enhanced decoding to fully make use of image complexity guidance and progressively restore spatial details.
- Faster and Accurate: ICTNet-S achieves 150.94 FPS with mIoU of 73.76 on the Cityscapes test set and 156.27 FPS with mIoU of 69.75% on the CamVid test set. Also, ICTNet-L achieves 129.54 FPS with a more accurate mIoU of 72.43%. Our models are trained from stretch, without any retraining.
A demo of the segmentation performance of our proposed ICTNets: Original video (left) and predictions of DABNet (middle-1) predictions of ICTNet-S (middle-2) and ICTNet-L (right)
This implementation is based on PIDNet. Please refer to their repository for installation and dataset preparation. The inference speed is tested on single RTX 3090 using the method in PIDNet. No third-party acceleration lib is used, so you can try TensorRT or other approaches for faster speed.
- Download the Cityscapes and CamVid datasets and unzip them in
data/cityscapesanddata/camviddirs. - Check if the paths contained in lists of
data/listare correct for dataset images.
- Download the images and annotations from Kaggle, where the resolution of images is 960x720 (original);
- Unzip the data and put all the images and all the colored labels into
data/camvid/images/anddata/camvid/labels, respectively; - Following the split of train, val and test sets used in SegNet-Tutorial, we have generated the dataset lists in
data/list/camvid/;
- Replace the data root in config files with your_root_of_dataset.
- Download the weight (icnet_ck.pth) of the Image Complexity Network from here and put it under models/checkpoint/.
- For example, train the ICTNet-S on Cityscapes:
python tools/train.py --cfg configs/cityscapes/ictednet_small_city_train.yaml- Or train the ICTNet-L on Cityscapes using train and val sets simultaneously:
python tools/train.py --cfg configs/cityscapes/ictednet_large_city_trainval.yaml- Download the trained models for Cityscapes and CamVid from here and put them into
trained_weights/cityscapes/andtrained_weights/camvid/dirs, respectively. - For example, evaluate the ICTNet-S on Cityscapes val set:
python tools/eval.py --cfg configs/cityscapes/ictednet_small_city_train.yaml \
TEST.MODEL_FILE trained_weights/cityscapes/ictednet_small_city_train.pt- Or, evaluate the ICTNet-M on CamVid test set:
python tools/eval.py --cfg configs/camvid/ictednet_small_camvid.yaml \
TEST.MODEL_FILE trained_weights/camvid/ictednet_small_camvid.pt \
DATASET.TEST_SET list/camvid/test.lst- Generate the testing results of ICTNet-L on Cityscapes test set:
python tools/eval.py --cfg configs/cityscapes/ictednet_large_city_trainval.yaml \
TEST.MODEL_FILE trained_weights/cityscapes/ictednet_large_city_trainval.pt \
DATASET.TEST_SET list/cityscapes/test.lst- Measure the inference speed of ICTNet-S for Cityscapes:
python speed/ictednet_speed_test.py --model 'ictednet_s' --classnum 19 --size 1024 2048- Put your images in
samples/and then run the command below using Cityscapes pretrained ICTNet-L for image format of .png:
python tools/custom_ictednet_city.py --a 'ictednet_large' --p './trained_weights/cityscapes/ictednet_large_city_trainval.pt' --t '.png'For Camvid:
python tools/custom_ictednet_cam.py --a 'ictednet_large' --p './trained_weights/camvid/ictednet_large_camvid.pt' --t '.png'- Our implementation is modified based on PIDNet, HRNet-Semantic-Segmentation, SANet, and SSSegmentation.
- Thanks for their nice contribution.