VLSM-Ensemble: Ensembling CLIP-based Vision-Language Segmentation Models

Accepted at MIDL 2025

Poster

Architecture

Datasets

For data preparation please follow the instructions posted in the medvlsm repository

To run our ensemble models

The models can be integrated into the medvlsm repository https://github.com/naamiinepal/medvlsm with a few simple tricks:

To run the BiomedCLIPSeg-A save BiomedCLIPSeg_A.py as medvlsm/src/models/biomedclipseg.py
To run the CLIPSeg-B save CLIPSeg_B.py as medvlsm/src/models/clipseg.py
To run Ensemble-C save Ensemble_C.py as medvlsm/src/models/biomedclipseg.py

To model UNet-D component

prepare the datasets such that you have three split folders train, val and test. The images and masks need to have the same filenames! The split for each dataset is defined in /data/.../anns folders in https://github.com/naamiinepal/medvlsm. This will require reading .json files and re-saving images in our three folders. This is very important - we need to use the same splits as in medvlsm repository!
Change paths to your data in train.py
To train UNet-D run train.py from the command line interface with $python train.py
Record the average Dice score and standard deviation (computed over the entire test set) for the Table 1
Do not make any changes to UNet_D.py! This component must be left unchanged for fair comparison with VLSMs ensembles
Modify paths to your test data in predict.py
Modify predict.py to save all predictions in a test set (We will need some of those for the paper)
Once trained, this will save the checkpoint - then run predict.py from the command line interface

Filenames for qualitative results in the paper:

BKAI: 3e732adb4c5d1580670d78b9d054d694.jpeg

CheXlocalize: patient64548_study1_view1_frontal_airspace_opacity.jpg (image) or patient64548_study1_view1_frontal_airspace_opacity.png (mask)

If you have prepared the above three split folders correct, then the BKAI and CheXlocalize filenames should be located in the corresponding test folders

Acknowledgement

This repo works with the experimental testbench, data splits and text prompts developed by naamiinepal (Poudel et al., 2024) in their medvlsm GitHub repo https://github.com/naamiinepal/medvlsm. Sincere thanks for their excellent work!

Citations

Julia Dietlmeier, Oluwabukola Grace Adegboro, Vayangi Ganepola, Claudia Mazo and Noel E. O'Connor. "VLSM-Ensemble: Ensembling CLIP-based Vision-Language Models for Enhanced Medical Image Segmentation", in MIDL 2025

Kanchan Poudel, Manish Dhakal, Prasiddha Bhandari, Rabin Adhikari, Safal Thapaliya and Bishesh Khanal. "Exploring Transfer Learning in Medical Image Segmentation using Vision-Language Models", in MIDL 2024

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
modeling_UNet_D		modeling_UNet_D
BiomedCLIPSeg_A.py		BiomedCLIPSeg_A.py
CLIPSeg_B.py		CLIPSeg_B.py
Ensemble_C.py		Ensemble_C.py
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VLSM-Ensemble: Ensembling CLIP-based Vision-Language Segmentation Models

Poster

Architecture

Datasets

To run our ensemble models

To model UNet-D component

Filenames for qualitative results in the paper:

Acknowledgement

Citations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VLSM-Ensemble: Ensembling CLIP-based Vision-Language Segmentation Models

Poster

Architecture

Datasets

To run our ensemble models

To model UNet-D component

Filenames for qualitative results in the paper:

Acknowledgement

Citations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages