-
Notifications
You must be signed in to change notification settings - Fork 8
Deep learning framework for integration, analysis and visualization of single cell data
Single-cell transcriptional profiling using RNA sequencing allows the identification of cell types on the basis of unsupervised clustering of the transcriptome. However, differences in experimental methods and computational analyses make it challenging to compare and reuse data across experiments. We have developed a deep learning framework which performs seamless integration, analysis and visualization of single-cell data from different experiments by using a learned internal representation that allows projecting cells from an scRNA-seq dataset onto cell types or individual cells from other experiments. We applied our method to 3 recently published datasets on the mouse brain covering near 2 million cells showing that it is possible to integrate diverse scRNA-seq datasets being able to obtain a universal representation of the molecular diversity of the mouse brain. Beyond that, our generative model is able to simulate realistic scRNA-seq data that covers the full diversity of cell types.
Saunders A, Macosko EZ, Wysoker A, Goldman M et al. Molecular Diversity and Specializations among the Cells of the Adult Mouse Brain. Cell 2018 Aug 9;174(4):1015-1030.e16.
Paper: http://sci-hub.tw/https://doi.org/10.1016/j.cell.2018.07.028
Data: http://dropviz.org/
GEO: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE116470
SRA: https://www.ncbi.nlm.nih.gov/sra?term=SRP151685
Amit Zeisel, Hannah Hochgerner, Peter Lönnerberg, Anna Johnsson, Fatima Memic, Job van der Zwan, Martin Häring, Emelie Braun, Lars E. Borm, Gioele La Manno, Simone Codeluppi, Alessandro Furlan, Kawai Lee, Nathan Skene, Kenneth D. Harris, Jens Hjerling-Leffler, Ernest Arenas, Patrik Ernfors, Ulrika Marklund, Sten Linnarsson. Molecular Architecture of the Mouse Nervous System, Cell, 2018, Volume 174(4), 999-1014.e22
Data: http://mousebrain.org
SRA: https://www.ncbi.nlm.nih.gov/sra/SRP135960
Zheng GX, Terry JM, Belgrader P, Ryvkin P et al. Massively parallel digital transcriptional profiling of single cells. Nat Commun 2017 Jan 16;8:14049.
Paper:
Data: https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.3.0/1M_neurons
https://s3.amazonaws.com/czbiohub-tabula-muris/TM_facs_mat.h5ad https://s3.amazonaws.com/czbiohub-tabula-muris/TM_droplet_mat.h5ad
import pandas
import scanpy
tm_facs_metadata = pd.read_csv('data/TM_facs_metadata.csv')
tm_facs_data = scanpy.anndata.read_h5ad('data/TM_facs_mat.h5ad')Some vignettes assume that you have downloaded the raw data from Figshare into
00_facs_raw_data: gene-cell count tables for FACS smartseq2 data and metadata
01_droplet_raw_data: CellRanger output count files for droplet data and metadata
The original data for the MCA is available on Figshare. The Satija lab has also repackaged the data for convenient use (in R) in this zip. It can be loaded as
mca.matrix = readRDS(here("data", "MCA_merged_mat.rds"))
mca.metadata = read_csv(here("data", "MCA_All-batch-removed-assignments.csv"))