Skip to content

Commit ce3bb36

Browse files
committed
updated header
1 parent 985c930 commit ce3bb36

File tree

1 file changed

+7
-1
lines changed

1 file changed

+7
-1
lines changed

integrating-data-using-ingest.ipynb

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,13 @@
1515
"\n",
1616
"**Note**\n",
1717
" \n",
18-
"The following tutorial describes [BBKNN](https://github.com/Teichlab/bbknn) [[Polanski19]](https://10.1093/bioinformatics/btz625), which integrates well with the Scanpy workflow and is accessible through [`bbknn`](https://scanpy.readthedocs.io/en/stable/external/scanpy.external.pp.bbknn.html) and a simple PCA-based method for integrating data we call [`ingest`](https://scanpy.readthedocs.io/en/latest/api/scanpy.tl.ingest.html). The latter assumes one has an annotated reference dataset that essentially captures the relevant biological variability and is well-embedded already. The rational then is to fit a model (here and for the time being, a PCA) on the reference data and to use it to project new data. As the model is simple and the procedure clear, the workflow is transparent and fast. Like BBKNN, it leaves the data matrix invariant. Unlike BBKNN, it solves the label mapping problem and maintains an embedding that might have desired properties - like displaying clear trajectories. Similar PCA-based integrations have been used in many papers before, for instance, in [Weinreb18](https://doi.org/10.1101/467886).\n",
18+
"The following tutorial describes [BBKNN](https://github.com/Teichlab/bbknn) [[Polanski19]](https://10.1093/bioinformatics/btz625), which integrates well with the Scanpy workflow and is accessible through [bbknn](https://scanpy.readthedocs.io/en/stable/external/scanpy.external.pp.bbknn.html) and a simple PCA-based method for integrating data we call [ingest](https://scanpy.readthedocs.io/en/latest/api/scanpy.tl.ingest.html).\n",
19+
" \n",
20+
"The [ingest](https://scanpy.readthedocs.io/en/latest/api/scanpy.tl.ingest.html) function assumes one has an annotated reference dataset that essentially captures the relevant biological variability and is well-embedded already. The rational then is to fit a model (here and for the time being, a PCA) on the reference data and to use it to project new data. Similar PCA-based integrations have been used in many papers before, for instance, in [Weinreb18](https://doi.org/10.1101/467886).\n",
21+
"\n",
22+
"* As the model is simple and the procedure clear, the workflow is transparent and fast.\n",
23+
"* Like BBKNN, it leaves the data matrix invariant.\n",
24+
"* Unlike BBKNN, it solves the label mapping problem and maintains an embedding that might have desired properties - like displaying clear trajectories.\n",
1925
"\n",
2026
"We refer to this *asymmetric* dataset integration as *ingesting* annotations from an annotated reference `adata_ref` into an `adata` that still lacks this annotation. This is different from learning a joint representation that integrates datasets in a symmetric way as [BBKNN](https://github.com/Teichlab/bbknn), MNN, Scanorma, Conos, CCA (e.g. in Seurat) or a conditional VAE (e.g. in scVI, trVAE) would do. Take a look at tools in the [external API](https://scanpy.readthedocs.io/en/latest/external/#data-integration) or at the [ecoystem page](https://scanpy.readthedocs.io/en/latest/ecosystem/#data-integration) to get a start with other tools.\n",
2127
"<div>"

0 commit comments

Comments
 (0)