Skip to content

MXGHarryLiu/HemoSight

Repository files navigation

HemoSight: Adaptive and Efficient Leukocyte Classification

This is the repository for the study published in this paper. This paper presents a self-supervised model for classifying white blood cells in peripheral blood smears, achieving high accuracy (F1: 96.2%) while generalizing to diverse label sets. The lightweight EfficientNetV2-B0-based approach enhances label efficiency with active learning and is available as the HemoSight web app to streamline clinical workflows.

Data Preparation

Workflow Overview

General flow of the data pipeline:

  1. curate.py: Curate raw data folders to create a data reference CSV.
  2. loader.py: Load the data reference CSV and perform train, validation, and test splits.
  3. train.py, generator.py: Train the model; data augmentation is generated from the generator.
  4. val.py, predictor.py: Validate trained model; predictor performs classification on embeddings.

Repository Organization

  • src stores python source code.
  • frontend stores frontend source code.
  • data is used to store raw data.
  • derived is used to store results.
  • mongodb is used by the MongoDB database. Three folders (data, derived, src) are needed for file I/O, and should be mounted to the container.

Configuration files:

  • Path configuration file (/src/core/settings.json):
    This file is loaded by util.GlobalSettings. settings.json is used by default but can be overridden by settings_{systemname}.json, where {systemname} is the computer name, such as ThinkPad. This enables the same repository codebase to be cloned and run in multiple environments.
  • Job configuration file: Some examples are located at /src/config_*.json. See below for their usage.

Environment

Local Docker

Dockerfile.gpu is the GPU version. Dockerfile.cpu is the CPU version.

  1. Build
    docker build -t hematology:v1 -f Dockerfile.cpu .
  2. Run the container with above three folder mounted
    docker run -it --gpus all --rm -v "$(pwd)/src:/src" -v "$(pwd)/derived:/derived" -v "D:/Drive/Data/Hematology:/data" hematology:v1
  3. After the container is running, execute the following in the container.
    • Training python -m model.train --cfg config.json
    • Validation python -m model.val --run 20231208192208

Model deployment

Deploy to Docker Containers

  1. Build
    docker compose -f docker-compose.dev.yaml up --build
  2. Access http://localhost:4002 for the index page.

Disclaimer

This project was developed with the assistance of generative AI. All outputs were reviewed and validated by the developers for correctness and quality.

This software is provided for research use only. It is not intended for clinical use, has not been validated for diagnostic purposes, and is not FDA approved.

Citation

If you find this repository helpful in your research or work, please consider citing our paper:

@INPROCEEDINGS{10913825,
  author={Liu, Zhuohe and Castillo, Simon P. and Han, Xin and Sun, Xiaoping and Hu, Zhihong and Yuan, Yinyin},
  booktitle={2024 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI)}, 
  title={Adaptive Self-Supervised Learning of Morphological Landscape for Leukocytes Classification in Peripheral Blood Smears}, 
  year={2024},
  volume={},
  number={},
  pages={1-7},
  keywords={White blood cells;Adaptation models;Reviews;Hematology;Active learning;Self-supervised learning;Manuals;Predictive models;Image classification;Testing;image classification;self-supervised learning;active learning;hematology;peripheral blood smear},
  doi={10.1109/BHI62660.2024.10913825}}

Feel free to contact us for further information or questions related to the paper and this repository.

Yuan Lab @ MD Anderson Cancer Center

About

Adaptive and efficient leukocyte classification with web application

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors