Efficient inference wrapper for the SubCell subcellular protein localization foundation model
SubCellPortable provides a streamlined interface for running the SubCell model on IF microscopy images. Generate single-cell embeddings that encode cell morphology or protein localization and predict protein subcellular localization from multi-channel fluorescence microscopy images.
π Preprint: SubCell: Proteome-aware vision foundation models for microscopy capture single-cell biology (Gupta et al., 2024)
# Clone repository
git clone https://github.com/yourusername/SubCellPortable.git
cd SubCellPortable
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt- Prepare your input CSV (
path_list.csv):
r_image,y_image,b_image,g_image,output_prefix
images/cell_1_mt.png,,images/cell_1_nuc.png,images/cell_1_prot.png,cell1_
images/cell_2_mt.png,,images/cell_2_nuc.png,images/cell_2_prot.png,cell2_
Channel mapping:
r= microtubules (red)y= ER (yellow)b= nuclei (blue/DAPI)g= protein of interest (green)
Leave channels empty if not available (e.g., use rbg for 3-channel images)
- Configure settings (
config.yaml):
model_channels: "rybg" # Channel configuration
output_dir: "./results" # Output directory
batch_size: 128 # Batch size (adjust for GPU memory)
gpu: 0 # GPU device ID (-1 for CPU)
output_format: "combined" # "combined" (h5ad) or "individual" (npy)- Run inference:
python process.py# Basic run with config file
python process.py
# Specify parameters via CLI
python process.py --output_dir ./results --batch_size 256 --gpu 0
# Custom config and input files
python process.py --config experiment_config.yaml --path_list experiment_data.csv -o ./results
# Embeddings only (faster, no classification)
python process.py -o ./results --embeddings_only
# Get help
python process.py --helpRecommended Format:
r_image,y_image,b_image,g_image,output_prefix
path/to/image1_mt.png,,path/to/image1_nuc.png,path/to/image1_prot.png,sample_1
path/to/image2_mt.png,,path/to/image2_nuc.png,path/to/image2_prot.png,batch_A/sample_2
- Skip rows by prefixing with
# - Create subfolders in the output folder by them to output_prefix like: /subfolder_1/sample_1
Legacy Format (deprecated but supported):
r_image,y_image,b_image,g_image,output_folder,output_prefix
...
| Parameter | Description | Default | Example |
|---|---|---|---|
--config |
Path to configuration YAML file | config.yaml |
experiment.yaml |
--path_list |
Path to input CSV file | path_list.csv |
data.csv |
--output_dir -o |
Output directory for all results | - | ./results |
--model_channels -c |
Channel configuration | rybg |
rbg, ybg, bg |
--model_type -m |
Model architecture | mae_contrast_supcon_model |
vit_supcon_model |
--output_format |
Output format | combined |
individual |
--num_workers -w |
Data loading workers | 4 |
8 |
--gpu -g |
GPU device ID (-1 = CPU) | -1 |
0 |
--batch_size -b |
Batch size | 128 |
256 |
--embeddings_only |
Skip classification | False |
- |
| Parameter | Description | Default |
|---|---|---|
--update_model -u |
Download/update models | False |
--prefetch_factor -p |
Prefetch batches | 2 |
--create_csv |
Generate combined CSV | False |
--save_attention_maps |
Save attention visualizations | False |
--async_saving |
Async file saving (individual only) | False |
--quiet -q |
Suppress verbose logging | False |
File: embeddings.h5ad (AnnData-compatible)
import anndata as ad
# Load results
adata = ad.read_h5ad("results/embeddings.h5ad")
# Access data
embeddings = adata.X # (n_samples, 1536)
probabilities = adata.obsm['probabilities'] # (n_samples, 31)
sample_ids = adata.obs_names # Image identifiersCompatible with: scanpy and other single-cell tools
Files per image:
{output_prefix}_embedding.npy- 1536D embedding vector{output_prefix}_probabilities.npy- 31-class probability distribution{output_prefix}_attention_map.png- Attention visualization (optional)
import numpy as np
embedding = np.load("results/cell1_embedding.npy") # Shape: (1536,)
probs = np.load("results/cell1_probabilities.npy") # Shape: (31,)File: result.csv
| Column | Description |
|---|---|
id |
Sample identifier |
top_class_name |
Top predicted location |
top_class |
Top class index |
top_3_classes_names |
Top 3 predictions (comma-separated) |
top_3_classes |
Top 3 indices |
prob00 - prob30 |
Full probability distribution |
feat0000 - feat1535 |
Full embedding vector |
The model predicts 31 subcellular locations (indexed 0-30, corresponding to prob00-prob30):
View all 31 classes
- Actin filaments
- Aggresome
- Cell Junctions
- Centriolar satellite
- Centrosome
- Cytokinetic bridge
- Cytoplasmic bodies
- Cytosol
- Endoplasmic reticulum
- Endosomes
- Focal adhesion sites
- Golgi apparatus
- Intermediate filaments
- Lipid droplets
- Lysosomes
- Microtubules
- Midbody
- Mitochondria
- Mitotic chromosome
- Mitotic spindle
- Nuclear bodies
- Nuclear membrane
- Nuclear speckles
- Nucleoli
- Nucleoli fibrillar center
- Nucleoli rim
- Nucleoplasm
- Peroxisomes
- Plasma membrane
- Vesicles
- Negative
Class names and visualization colors available in inference.py (CLASS2NAME, CLASS2COLOR dictionaries).
Models are automatically downloaded on first run with -u/--update_model:
python process.py -u --output_dir ./resultsEdit models_urls.yaml to specify custom model URLs:
rybg: # 4-channel configuration
mae_contrast_supcon_model:
encoder: "s3://bucket/path/to/encoder.pth"
classifier_s0: "https://url/to/classifier.pth"If you use SubCellPortable in your research, please cite:
@article{gupta2024subcell,
title={SubCell: Proteome-aware vision foundation models for microscopy capture single-cell biology},
author={Gupta, Ankit and Wefers, Zoe and Kahnert, Konstantin and Hansen, Jan N. and Leineweber, Will and Cesnik, Anthony and Lu, Dan and Axelsson, Ulrika and Balllosera Navarro, Frederic and Karaletsos, Theofanis and Lundberg, Emma},
journal={bioRxiv},
year={2024},
doi={10.1101/2024.12.06.627299}
}This project is licensed under the MIT License - see the LICENSE file for details.
- Issues: GitHub Issues
SubCellPortable wrapper maintained with β€οΈ by the Lundberg Lab for the computational biology community.