This repository contains a comprehensive processing pipeline for spatial transcriptomics data analysis.
This pipeline was built and deployed on Code Ocean, a cloud-based computational research platform. The pipeline leverages Code Ocean's containerized environment to ensure reproducible results across different computing environments.
Code Ocean documentation: https://docs.codeocean.com/user-guide
The processing section of pipeline consists of the following sequential steps:
Quality control filtering and doublet detection with SOLO (Semi-supervised Outlier Detection).
Perform MapMyCells cell type mapping.
Aggregate individual section-level AnnData files into a single combined dataset for whole-dataset analysis.
Add color mappings for cell type classifications to AnnData objects using the ABC atlas color scheme.
Perform quality control on cell type mapping results using Double Median Absolute Deviation (DoubleMAD) statistics to identify and filter cells with poor mapping confidence scores.
Save final processed results from the pipeline and add final QC column.
The domain detection section of pipeline consists of the following sequential steps:
Bin transcript spots and performs QC filtering.
Perform spatial alignment and integration of multiple tissue sections using STAligner.
Perform Leiden clustering with STAligner embeddings.
Map cluster assignments from downsampled STAligner gridded data to cell segmentation data.
Consolidate cluster assignments to full processed dataset.
Running via Code Ocean UI:
- Create a new Pipeline by cloning this repository
- Replace Data Parameters
- Configure App Panel with your dataset-specific parameters
- Verify data format matches expected input structure
- Click "Run with parameters" to run pipeline
Running on your local machine:
- Click Pipeline -> Export
- Follow the instructions in REPRODUCING.md
All pipeline parameters are configured in the Create Parameters JSON capsule and centralized in params.json. Key parameter categories include:
- QC Filtering Parameters
- Mapping Parameters
- Metadata Parameters
- Domain Detection Parameters
data/
├── section1.h5ad
├── section2.h5ad
├── section3.h5ad
...
Required columns:
xandy: cell centroid coordinatesbrain_section_barcode: Section ID- Index containing unique cell labels (e.g.,
{brain_section_barcode}_SIS_{i})
The pipeline generates the following key outputs:
results/
├── whole_dataset/
│ ├── {specimen}_{dataset_id}_filtered.h5ad
| └── {specimen}_{dataset_id}_filtered.csv
└── sections/
├── section1_filtered.h5ad
├── section2_filtered.h5ad
└── ...
Key output files:
{specimen}_{dataset_id}_filtered.h5ad: Combined, QC-filtered datasectioni_filtered.h5ad: QC-filtered data split bybrain_section_barcode
For detailed information about each step, refer to the individual markdown files linked above. Each file contains:
- Detailed methodology description
- Input/output file specifications
- Configuration parameter explanations
- Expected results and metadata columns