This is the repository for the study published in this paper. This paper presents a self-supervised model for classifying white blood cells in peripheral blood smears, achieving high accuracy (F1: 96.2%) while generalizing to diverse label sets. The lightweight EfficientNetV2-B0-based approach enhances label efficiency with active learning and is available as the HemoSight web app to streamline clinical workflows.
General flow of the data pipeline:
curate.py: Curate raw data folders to create a data reference CSV.loader.py: Load the data reference CSV and perform train, validation, and test splits.train.py,generator.py: Train the model; data augmentation is generated from the generator.val.py,predictor.py: Validate trained model; predictor performs classification on embeddings.
srcstores python source code.frontendstores frontend source code.datais used to store raw data.derivedis used to store results.mongodbis used by the MongoDB database. Three folders (data,derived,src) are needed for file I/O, and should be mounted to the container.
Configuration files:
- Path configuration file (
/src/core/settings.json):
This file is loaded byutil.GlobalSettings.settings.jsonis used by default but can be overridden bysettings_{systemname}.json, where{systemname}is the computer name, such asThinkPad. This enables the same repository codebase to be cloned and run in multiple environments. - Job configuration file:
Some examples are located at
/src/config_*.json. See below for their usage.
Dockerfile.gpu is the GPU version. Dockerfile.cpu is the CPU version.
- Build
docker build -t hematology:v1 -f Dockerfile.cpu . - Run the container with above three folder mounted
docker run -it --gpus all --rm -v "$(pwd)/src:/src" -v "$(pwd)/derived:/derived" -v "D:/Drive/Data/Hematology:/data" hematology:v1
- After the container is running, execute the following in the container.
- Training
python -m model.train --cfg config.json - Validation
python -m model.val --run 20231208192208
- Training
- Build
docker compose -f docker-compose.dev.yaml up --build
- Access
http://localhost:4002for the index page.
This project was developed with the assistance of generative AI. All outputs were reviewed and validated by the developers for correctness and quality.
This software is provided for research use only. It is not intended for clinical use, has not been validated for diagnostic purposes, and is not FDA approved.
If you find this repository helpful in your research or work, please consider citing our paper:
@INPROCEEDINGS{10913825,
author={Liu, Zhuohe and Castillo, Simon P. and Han, Xin and Sun, Xiaoping and Hu, Zhihong and Yuan, Yinyin},
booktitle={2024 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI)},
title={Adaptive Self-Supervised Learning of Morphological Landscape for Leukocytes Classification in Peripheral Blood Smears},
year={2024},
volume={},
number={},
pages={1-7},
keywords={White blood cells;Adaptation models;Reviews;Hematology;Active learning;Self-supervised learning;Manuals;Predictive models;Image classification;Testing;image classification;self-supervised learning;active learning;hematology;peripheral blood smear},
doi={10.1109/BHI62660.2024.10913825}}
Feel free to contact us for further information or questions related to the paper and this repository.
Yuan Lab @ MD Anderson Cancer Center