Filled in the documentation

tlebchan · tlebchan · commit e5ed94622b31 · 2026-03-13T18:48:38.000+04:00
diff --git a/README.md b/README.md
@@ -49,6 +49,7 @@ Steps marked with the boat icon are not yet implemented. For the other steps, th
       - [scrublet](https://scanpy.readthedocs.io/en/stable/api/generated/scanpy.pp.scrublet.html)
       - [DoubletDetection](https://doubletdetection.readthedocs.io/en/v2.5.2/doubletdetection.doubletdetection.html)
       - [SCDS](https://bioconductor.org/packages/devel/bioc/vignettes/scds/inst/doc/scds.html)
+   7. Cell cycle scoring ([Tirosh et al. 2015](https://doi.org/10.1038/nature14590))
 2. Sample aggregation
    1. Merge into a single h5ad file
    2. Present QC for merged counts ([`MultiQC`](http://multiqc.info/))
diff --git a/docs/output.md b/docs/output.md
@@ -25,6 +25,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
       - [scrublet](https://scanpy.readthedocs.io/en/stable/api/generated/scanpy.pp.scrublet.html)
       - [DoubletDetection](https://doubletdetection.readthedocs.io/en/v2.5.2/doubletdetection.doubletdetection.html)
       - [SCDS](https://bioconductor.org/packages/devel/bioc/vignettes/scds/inst/doc/scds.html)
+   7. Cell cycle scoring ([Tirosh et al. 2015](https://doi.org/10.1038/nature14590))
 2. Sample aggregation
    1. Merge into a single h5ad file
    2. Present QC for merged counts ([`MultiQC`](http://multiqc.info/))
@@ -60,6 +61,9 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
     - `(doubletdetection|scds|scrublet|solo)/`: Results of doublet detection. Each directory contains a filtered `h5ad`/`rds` and a `csv`/`pkl` file with the doublet annotations.
     - `${sample_id}.h5ad`: The h5ad without doublets.
   - `qc_preprocessed/`: QC plots for the preprocessed data.
+  - `cell_cycle/`: Cell cycle scoring results.
+    - `${sample_id}_cellcycle.pkl`: `S_score`, `G2M_score`, and `phase` columns for each cell. Merged into the final h5ad via `FINALIZE_QC_ANNDATAS`.
+    - `${sample_id}_cellcycle.h5ad`: Intermediate h5ad with cell cycle scores added, available for inspection.
 
 </details>
 
diff --git a/docs/usage.md b/docs/usage.md
@@ -138,6 +138,46 @@ monaco_immune,label.fine,/path/to/monaco_immune.tar
 
 Example tar archives can be found [here](https://github.com/nf-core/test-datasets/tree/scdownstream/singleR).
 
+### Cell cycle scoring
+
+Cell cycle scoring assigns each cell an S-phase score, G2M-phase score, and a predicted cell cycle phase (`S`, `G2M`, or `G1`) based on the expression of curated marker genes (Tirosh et al. 2015, same gene sets as Seurat). The scores are stored in `adata.obs` as `S_score`, `G2M_score`, and `phase`, and are available as covariates in downstream integration steps.
+
+Cell cycle scoring is enabled by default. To skip it:
+
+```bash
+nextflow run nf-core/scdownstream --input samplesheet.csv --outdir results --cell_cycle_scoring false
+```
+
+#### Species
+
+Bundled gene lists are provided for human and mouse. Select the appropriate species with `--species`:
+
+```bash
+# mouse
+nextflow run nf-core/scdownstream --input samplesheet.csv --outdir results --species mouse
+```
+
+#### Custom gene lists
+
+For other organisms (e.g. rat, zebrafish), you can provide your own gene lists — one gene symbol per line — via `--s_genes` and `--g2m_genes`:
+
+```bash
+nextflow run nf-core/scdownstream --input samplesheet.csv --outdir results \
+    --s_genes /path/to/my_s_genes.txt \
+    --g2m_genes /path/to/my_g2m_genes.txt
+```
+
+The bundled gene lists can be found in [`assets/cell_cycle_genes/`](../assets/cell_cycle_genes/) and serve as templates for custom lists.
+
+#### Using scores in downstream analysis
+
+The `S_score` and `G2M_score` columns can be passed to integration tools as continuous covariates to regress out cell cycle effects:
+
+```bash
+nextflow run nf-core/scdownstream --input samplesheet.csv --outdir results \
+    --scvi_continuous_covariates S_score,G2M_score
+```
+
 ### Reference mapping
 
 The pipeline supports mapping new samples into the latent space of an existing scVI/scANVI model.