Skip to content

Deep learning approaches for single cell data

Alberto Labarga edited this page Jan 21, 2019 · 18 revisions

Papers

Massive single-cell RNA-seq analysis and imputation via deep learning

We present scScope, a scalable deep-learning based approach that can accurately and rapidly identify cell-type composition from millions of noisy single-cell gene-expression profiles.

Link: https://www.biorxiv.org/content/early/2018/11/27/315556

Code: https://github.com/AltschulerWu-Lab/scScope

Single cell RNA-seq denoising using a deep count autoencoder

We propose a deep count autoencoder network (DCA) to denoise scRNA-seq datasets. DCA takes the count distribution, overdispersion and sparsity of the data into account using a zero-inflated negative binomial noise model, and nonlinear gene-gene or gene-dispersion interactions are captured. Our method scales linearly with the number of cells and can therefore be applied to datasets of millions of cells. We demonstrate that DCA denoising improves a diverse set of typical scRNA-seq data analyses using simulated and real datasets. DCA outperforms existing methods for data imputation in quality and speed, enhancing biological discovery.

Link: https://www.biorxiv.org/content/early/2018/04/13/300681

Code: https://github.com/theislab/dca

Tags: tensorflow, python, autoencoder

Deep generative modeling for single-cell transcriptomics

Single-cell transcriptome measurements can reveal unexplored biological diversity, but they suffer from technical noise and bias that must be modeled to account for the resulting uncertainty in downstream analyses. Here we introduce single-cell variational inference (scVI), a ready-to-use scalable framework for the probabilistic representation and analysis of gene expression in single cells. scVI uses stochastic optimization and deep neural networks to aggregate information across similar cells and genes and to approximate the distributions that underlie observed expression values, while accounting for batch effects and limited sensitivity. We used scVI for a range of fundamental analysis tasks including batch correction, visualization, clustering, and differential expression, and achieved high accuracy for each task.

Link: https://people.eecs.berkeley.edu/~jregier/publications/lopez2018deep.pdf

Code: https://github.com/YosefLab/scVI

Generative modeling and latent space arithmetics predict single-cell perturbation response across cell types, studies and species

We demonstrate that scGen learns cell type and species specific response implying that it captures features that distinguish responding from non-responding genes and cells. With the upcoming availability of large-scale atlases of organs in healthy state, we envision scGen to become a tool for experimental design through in silico screening of perturbation response in the context of disease and drug treatment.

Link: https://www.biorxiv.org/content/early/2018/12/14/478503

Code: https://github.com/theislab/scGen

Exploring Single-Cell Data with Deep Multitasking Neural Networks

SAUCIE, a deep neural network that leverages the high degree of parallelization and scalability offered by neural networks, as well as the deep representation of data that can be learned by them to perform many single-cell data analysis tasks, all on a unified representation

Link: https://www.biorxiv.org/content/early/2019/01/03/237065

Code: https://github.com/KrishnaswamyLab/SAUCIE/

Tags: tensorflow, python, autoencoder

Imputation of single-cell gene expression with an autoencoder neural network

We treated zeros as missing values and developed deep learning methods to impute the missing values, exploiting the dependence structure across genes and across cells. Specifically, our LATE (Learning with AuToEncoder) method trains an autoencoder directly on scRNA-seq data with random initial values of the parameters, whereas our TRANSLATE (TRANSfer learning with LATE) method further allows for the use of a reference gene expression data set to provide LATE with an initial set of parameter estimates.

Link: https://www.biorxiv.org/content/early/2018/12/29/504977

Code: https://github.com/audreyqyfu/LATE

Tags: tensorflow, python, autoencoder

VASC: dimension reduction and visualization of single cell RNA sequencing data by deep variational autoencoder

We proposed VASC (deep Variational Autoencoder for scRNA-seq data), a deep multi-layer generative model, for the unsupervised dimension reduction and visualization of scRNA-seq data. It can explicitly model the dropout events and find the nonlinear hierarchical feature representations of the original data.

Link: https://www.biorxiv.org/content/early/2017/10/06/199315

Code: https://github.com/wang-research/VASC

Variational Autoencoder for Unmasking Tumor Heterogeneity from Single Cell Genomic Data

Here we are proposing Dhaka a variational autoencoder based single cell analysis tool to transform genomic data to a latent encoded feature space that is more efficient in differentiating between the hidden tumor subpopulations.

Link: https://www.biorxiv.org/content/early/2018/04/19/183863

Code: https://github.com/MicrosoftGenomics/Dhaka

Variational auto-encoders for single-cell gene expression data

We propose a novel variational auto-encoder-based method for analysis of single-cell RNA sequencing data (scVAE) data. It avoids data preprocessing by using raw count data as input and can robustly estimate the expected gene expression levels and a latent representation for each cell. We show for several scRNA-seq data sets that our method outperforms recently proposed scRNA-seq methods in clustering cells.

Link: https://www.biorxiv.org/content/early/2018/05/16/318295

Code: https://github.com/chgroenbech/scVAE

Extracting a biologically relevant latent space from cancer transcriptomes with variational autoencoders

Variational autoencoders (VAEs) are a deep neural network approach capable of generating meaningful latent spaces for image and text data. In this work, we sought to determine the extent to which a VAE can be trained to model cancer gene expression, and whether or not such a VAE would capture biologically-relevant features. We name our method “Tybalt”.

Link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5728678/

Code: https://github.com/greenelab/tybalt

Visualizing and interpreting single-cell gene expression datasets with Similarity Weighted Nonnegative Embedding

We developed Similarity Weighted Nonnegative Embedding (SWNE), which enhances interpretation of datasets by embedding the genes and factors that separate cell states alongside the cells on the visualization, captures local structure better than t-SNE and existing methods, and maintains fidelity when visualizing global structure. SWNE uses nonnegative matrix factorization to decompose the gene expression matrix into biologically relevant factors, embeds the cells, genes and factors in a 2D visualization, and uses a similarity matrix to smooth the embeddings.

Link: https://www.biorxiv.org/content/early/2018/06/22/276261

Code: https://github.com/yanwu2014/swne

Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning

We propose a novel similarity-learning framework, SIMLR (single-cell interpretation via multi-kernel learning), which learns an appropriate distance metric from the data for dimension reduction, clustering and visualization applications.

Link: https://www.biorxiv.org/content/early/2017/02/28/052225

Code: https://github.com/BatzoglouLabSU/SIMLR

Automated, probabilistic assignment of cell types in scRNA-seq data

cellassign automatically assigns single-cell RNA-seq data to known cell types across thousands of cells accounting for patient and batch specific effects. Information about a priori known markers cell types is provided as input to the model in the form of a (binary) marker gene by cell-type matrix. cellassign then probabilistically assigns each cell to a cell type, removing subjective biases from typical unsupervised clustering workflows.

Code: https://github.com/irrationone/cellassign

Cell type prediction at single-cell resolution

Here we present a new generalizable method (scPred) for prediction of cell type(s), using a combination of unbiased feature selection from a reduced-dimension space, and machine-learning classification. scPred solves several problems associated with the identification of individual gene feature selection, and is able to capture subtle effects of many genes, increasing the overall variance explained by the model, and correspondingly improving the prediction accuracy.

Link: https://www.biorxiv.org/content/early/2018/12/03/369538

Code: https://github.com/IMB-Computational-Genomics-Lab/scPred

Integration of multiple single-cell transcriptomics datasets leveraging stable expression and pseudo-replication

To enable effective interrogation of multiple scRNA-Seq datasets, we have developed a novel algorithm, named scMerge, that removes unwanted variation by combining stably expressed genes and utilizing pseudo-replicates across datasets. Analysis of large collections of publicly available datasets demonstrates that scMerge performs well in multiple scenarios and enhances biological discovery, including inferring cell developmental trajectories.

Link: https://www.biorxiv.org/content/early/2018/08/16/393280

Code: https://github.com/SydneyBioX/scMerge

Dimensionality reduction for zero-inflated single cell gene expression analysis

Dimensionality reduction of single-cell RNA-seq high-dimensional datasets is essential for visualization and analysis, but single-cell RNA-seq data is challenging for classical dimensionality reduction methods because of the prevalence of dropout events leading to zero-inflated data. Here we develop a dimensionality reduction method, (Z)ero (I)nflated (F)actor (A)nalysis (ZIFA), which explicitly models the dropout characteristics, and show that it improves performance on simulated and biological datasets.

Link: https://www.biorxiv.org/content/early/2015/06/14/019141

Code: https://github.com/epierson9/ZIFA

A single-cell gene expression profile annotation tool using reference datasets

We introduce scMatch, which directly annotates single cells by identifying their closest match in large reference datasets. We used this strategy to annotate various single-cell datasets and evaluated the impacts of sequencing depth, similarity metric and reference datasets.

Code: ttps://github.com/forrest-lab/scmatch

A web server for comparative analysis of single-cell RNA-seq data

We present scQuery, a web server which uses our neural networks and fast matching methods to determine cell types, key genes, and more.

Link: https://www.biorxiv.org/content/early/2018/05/31/323238

Code: https://github.com/AmirAlavi/scrna_nn

https://github.com/mruffalo/sc-rna-seq-pipeline

https://github.com/AmirAlavi/single_cell_deg

Using Neural Networks for Reducing the Dimensions of Single-Cell RNA-Seq Data

We develop and test a method based on neural networks (NN) for the analysis and retrieval of single cell RNA-Seq data. We tested various NN architectures, some of which incorporate prior biological knowledge, and used these to obtain a reduced dimension representation of the single cell expression data. We show that the NN method improves upon prior methods in both, the ability to correctly group cells in experiments not used in the training and the ability to correctly infer cell type or state by querying a database of tens of thousands of single cell profiles. Such database queries (which can be performed using our web server) will enable researchers to better characterize cells when analyzing heterogeneous scRNA-Seq samples.

Link: https://academic.oup.com/nar/article/45/17/e156/4056711

A tool for unsupervised projection of single cell RNA-seq data

Here, we present scmap, a method for projecting cells from a scRNA-seq experiment onto the cell-types or individual cells identified in other experiments

Link: https://www.biorxiv.org/content/early/2017/11/29/150292

Code: https://github.com/hemberg-lab/scmap

Zero-preserving imputation of scRNA-seq data using low-rank approximation

We present a method based on low-rank approximation which successfully replaces these dropouts (zero expression levels of unobserved expressed genes) by nonzero values, while preserving biologically non-expressed genes (true biological zeros) at zero expression levels.

Link: https://www.biorxiv.org/content/early/2018/08/22/397588

Code: https://github.com/KlugerLab/ALRA

Classification of Single cells by Transfer Learning

We therefore suggest a different approach for cell labeling, namely, Castle - classifying cells from scRNA-seq datasets by using a model transferred from different (previously labeled) datasets

Link: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0205499

Code: https://github.com/yuvallb/CaSTLe

Transfer learning in single-cell transcriptomics improves data denoising and pattern discovery

We show that a deep autoencoder coupled to a Bayesian model remarkably improves UMI-based scRNA-seq data quality by transfer learning across datasets. This new technology, SAVER-X, outperforms existing state-of-the-art tools. The deep learning model in SAVER-X extracts transferable gene expression features across data from different labs, generated by varying technologies, and obtained from divergent species.

Link: https://www.biorxiv.org/content/early/2018/11/01/457879

Code: http://singlecell.wharton.upenn.edu/saver-x/

Reproducible Classification Analysis of Single Cell sequencing data

We present rCASC a modular RNAseq analysis workflow allowing data analysis from counts generation to cell sub-population signatures identification, granting both functional and computation reproducibility

Link: https://www.biorxiv.org/content/early/2018/10/03/430967

Code: https://github.com/kendomaniac/rCASC

Fast Batch Alignment of Single Cell Transcriptomes Unifies Multiple Mouse Cell Atlases into an Integrated Landscape

BBKNN is a fast and intuitive batch effect removal tool that can be directly used in the scanpy workflow. It serves as an alternative to scanpy.api.pp.neighbors(), with both functions creating a neighbour graph for subsequent use in clustering, pseudotime and UMAP visualisation.

Link: https://www.biorxiv.org/content/early/2018/08/22/397042

Code: https://github.com/Teichlab/bbknn

Fast, sensitive, and flexible integration of single cell data with Harmony

Unlike available single-cell integration methods, Harmony can simultaneously account for multiple experimental and biological factors. We develop objective metrics to evaluate the quality of data integration. In four separate analyses, we demonstrate the superior performance of Harmony to four single-cell-specific integration algorithms.

Link: https://www.biorxiv.org/content/early/2018/11/05/461954

Code: https://github.com/immunogenomics/harmony

MAGAN Aligning Biological Manifolds

We present a new GAN called the Manifold-Aligning GAN (MAGAN) that aligns two manifolds such that related points in each measurement space are aligned together. We demonstrate applications of MAGAN in single-cell biology in integrating two different measurement types together. In our demonstrated examples, cells from the same tissue are measured with both genomic (single-cell RNA-sequencing) and proteomic (mass cytometry) technologies.

Link: https://arxiv.org/abs/1803.00385

Code: https://github.com/KrishnaswamyLab/MAGAN

Visualization

The art of using t-SNE for single-cell transcriptomics

Here we describe a protocol for successful exploratory data analysis using t-SNE. They include PCA initialisation, multi-scale similarity kernels, exaggeration, and downsampling-based initialisation for very large data sets. We use published single-cell RNA-seq data sets to demonstrate that this protocol yields superior results compared to the naive application of t-SNE.

Link: https://www.biorxiv.org/content/early/2018/10/25/453449

Code: https://github.com/berenslab/rna-seq-tsne

Neural Data Visualization for Scalable and Generalizable Single Cell Analysis

We introduce net-SNE, which trains a neural network to learn a high quality visualization of single cells that newly generalizes to unseen data. While matching the visualization quality of t-SNE on 14 benchmark data sets of varying sizes, from hundreds to 1.3 million cells, net-SNE also effectively positions previously unseen cells, even when an entire subtype is missing from the initial data set or when the new cells are from a different sequencing experiment.

Link: https://www.biorxiv.org/content/early/2018/03/27/289223

Code: https://github.com/hhcho/netsne

Evaluation of UMAP as an alternative to t-SNE for single-cell data

Uniform Manifold Approximation and Projection (UMAP) is a recently-published non-linear dimensionality reduction technique. Another such algorithm, t-SNE, has been the default method for such task in the past years. Herein we comment on the usefulness of UMAP high-dimensional cytometry and single-cell RNA sequencing, notably highlighting faster runtime and consistency, meaningful organization of cell clusters and preservation of continuums in UMAP compared to t-SNE.

Link: https://www.biorxiv.org/content/early/2018/04/10/298430

Code: https://github.com/lmcinnes/umap/

Code: https://github.com/alabarga/Parametric-t-SNE

Single Cell Interactive Visualisation and Analysis

scIVA is an interactive web-tool for visualisation and analysis of single cell RNA-Seq data.

Code: https://github.com/IMB-Computational-Genomics-Lab/scIVA

Tags: R, shiny

Single-cell RNAseq cluster assessment and visualization

scClustViz provides an interactive interface for visualisation, assessment, and biological interpretation of cell-type classifications in scRNAseq experiments that can be easily added to existing analysis pipelines, enabling customization by bioinformaticians while enabling biologists to explore their results without the need for computational expertise.

Link: https://f1000research.com/articles/7-1522/v1

Code: https://github.com/BaderLab/scClustViz

Tags: R, shiny

Interactive platform for Single-cell RNAseq

iS-CellR is a web-based Shiny application designed to provide a comprehensive analysis of single-cell RNA sequencing data. iS-CellR provides a fast method for filtering and normalization of raw data, dimensionality reductions (linear and non-linear) to identify cell types clusters, differential gene expression analysis to locate markers, and inter-/intra-sample heterogeneity analysis.

Link: https://academic.oup.com/bioinformatics/article/34/24/4305/5048937

Code: https://github.com/immcore/iS-CellR

Tags: R, shiny

Molecular Diversity and Specializations among the Cells of the Adult Mouse Brain.

Shiny app for visualization, exploration of mouse brain single cell gene expression

Link: http://sci-hub.tw/https://doi.org/10.1016/j.cell.2018.07.028

Code: https://github.com/broadinstitute/dropviz

Tags: R, shiny

CellAtlasSearch: a scalable search engine for single cells

We have recently shown how Locality Sensitive Hashing (LSH) improves speed and accuracy of cell type clustering (14). CellAtlasSearch implements LSH on the powerful GPU architecture to attain an unmatched speed in archiving and querying expression data. Hashing based low dimensional encoding of expression profiles makes data transactions efficient and inexpensive, thus future-proof.

Link: https://academic.oup.com/nar/article/46/W1/W141/5000022

Code: http://www.cellatlassearch.com/

too-many-cells

too-many-cells is a suite of tools, algorithms, and visualizations focusing on the relationships between cell clades. This includes new ways of clustering, plotting, choosing differential expression comparisons, and more

https://github.com/GregorySchwartz/too-many-cells

Clone this wiki locally