- "The following tutorial describes [BBKNN](https://github.com/Teichlab/bbknn) [[Polanski19]](https://10.1093/bioinformatics/btz625), which integrates well with the Scanpy workflow and is accessible through [`bbknn`](https://scanpy.readthedocs.io/en/stable/external/scanpy.external.pp.bbknn.html) and a simple PCA-based method for integrating data we call [`ingest`](https://scanpy.readthedocs.io/en/latest/api/scanpy.tl.ingest.html). The latter assumes one has an annotated reference dataset that essentially captures the relevant biological variability and is well-embedded already. The rational then is to fit a model (here and for the time being, a PCA) on the reference data and to use it to project new data. As the model is simple and the procedure clear, the workflow is transparent and fast. Like BBKNN, it leaves the data matrix invariant. Unlike BBKNN, it solves the label mapping problem and maintains an embedding that might have desired properties - like displaying clear trajectories. Similar PCA-based integrations have been used in many papers before, for instance, in [Weinreb18](https://doi.org/10.1101/467886).\n",
0 commit comments