Merge pull request #859 from AlexsLemonade/auto_copy_exercises

sjspielman · web-flow · commit 48b31640d592 · 2025-07-17T13:01:32.000Z
GHA: Automated transfer of exercise notebooks
diff --git a/intro-to-R-tidyverse/exercise_01-intro_to_base_R.Rmd b/intro-to-R-tidyverse/exercise_01-intro_to_base_R.Rmd
@@ -155,7 +155,7 @@ What is the second most common brain location?
 ```
 
 Which samples are from the cerebral hemisphere? 
-Create a boolean vector named `cerebral` that is `TRUE` for cerebral hemisphere samples and `FALSE` otherwise.
+Create a logical vector named `cerebral` that is `TRUE` for cerebral hemisphere samples and `FALSE` otherwise.
 
 ```{r cerebral, solution = TRUE}
 
diff --git a/intro-to-R-tidyverse/exercise_03a-intro_to_tidyverse.Rmd b/intro-to-R-tidyverse/exercise_03a-intro_to_tidyverse.Rmd
@@ -247,7 +247,9 @@ When you have your plot finalized, save it to a PNG to our `results_dir` using
 
 Now that we've seen what our distribution looks like, let's apply a filter cutoff to `genes_df`.
 Choose a minimum gene expression cutoff and use `dplyr::filter()` to filter out the low expression genes based on that cutoff.
+Specifically, you'll want to keep rows where the `gene_medians` value is above your cutoff.
 Recall that `dplyr::filter()` will *keep* rows where your logical expression is `TRUE`.
+
 We'll call this newly filtered data frame `filtered_genes_df`.
 
 ```{r filter-genes, solution = TRUE}
diff --git a/scRNA-seq/exercise_01-scrna_quant.Rmd b/scRNA-seq/exercise_01-scrna_quant.Rmd
@@ -506,7 +506,7 @@ Because `DoubletScore` is a part of the `colData` you can filter using the same
 Name your new object `sce_filtered_singlets`.
 
 ```{r remove_doublets, solution = TRUE}
-# create a boolean vector of if a cell is a singlet or doublet
+# create a logical vector of if a cell is a singlet or doublet
 
 # only keep singlets
 
diff --git a/scRNA-seq/exercise_02-scrna_clustering.Rmd b/scRNA-seq/exercise_02-scrna_clustering.Rmd
@@ -28,9 +28,6 @@ library(ggplot2)
 library(scater)
 library(scran)
 
-# clustering tools
-library(bluster)
-
 # File path to plots directory
 plots_dir <- "plots"
 
@@ -141,28 +138,28 @@ Also try adding some color with the `color_by` argument, using the number of gen
 ## Cell clustering
 
 
-
 When we performed dimensionality reduction on our data above, we could see visually that the cells tended cluster together into groups.
 Let's try identifying distinct groups of cells with similar patterns of gene expression that we can assign labels to.
 
 As noted in the `05-clustering_markers_scRNA.Rmd` instruction notebook, there are a number of methods to identify clusters and assign cells to those in multidimensional data like the single cell data we have.
 
-We can use the function `clusterRows()` from the `bluster` package to facilitate applying many of those algorithms.
+We can use the function `scran::clusterCells()` to facilitate applying many of those algorithms.
+This function uses the package `bluster` to perform clustering on cells in a `SingleCellExperiment` object.
 
-To specify the clustering algorithm and the parameters specific to that algorithm to `clusterRows()`, we will provide it an argument with one of a set of `*Params()` functions, which are outlined below.
+To specify the clustering algorithm and the parameters for that algorithm to `scran::clusterCells()`, we will provide it an argument with one of a set of `*Params()` functions from the `bluster` package, which are outlined below.
 Note that there are other algorithms that we will not discuss, but can be explored in the [Flexible clustering for Bioconductor vignette](https://bioconductor.org/packages/3.19/bioc/vignettes/bluster/inst/doc/clusterRows.html).
 
-- `KmeansParam()` will apply the k-means clustering algorithm (see also `?kmeans`).
+- `bluster::KmeansParam()` will apply the k-means clustering algorithm (see also `?kmeans`).
 Some of the more important parameters are:
   - `centers`: the number of clusters to be assigned
   - `nstart`: Since the final k-means clusters may depend on the random clusters chosen at the start, we might want to repeat the procedure a number of times and choose the best output. This argument tells how many times to repeat the clustering (the default is 1).
 
-- `NNGraphParam()` will apply community detection algorithms on a nearest-neighbor (NN) graph.
+- `bluster::NNGraphParam()` will apply community detection algorithms on a nearest-neighbor (NN) graph.
 Some of the more important parameters are:
   - `k`: the number of neighbors for each cell to use when constructing the graph
   - `type`: how the neighbor graph is weighted.
   The options are `rank` (default), `number` and `jaccard`, with the last of those being the default in `Seurat`.
-  For more details see `?makeSNNGraph`
+  For more details see `?bluster::makeSNNGraph`
   - `cluster.fun`: Which cluster detection algorithm to use.
   The options are many, but common choices are `walktrap` (default),  `louvain` (used by `Seurat`), and `leiden`.
 
@@ -182,30 +179,29 @@ That is a hard question and for now the answer is up to you!
 However, for an intuitive visualization of the general k-means method, you might find [this StatQuest video](https://www.youtube.com/watch?v=4b5d3muPQmA) useful.
 For more discussion of the method in a single-cell context, including some tips on choosing `k`, the [Orchestrating Single-Cell Analysis book chapter on k-means](https://bioconductor.org/books/3.19/OSCA.basic/clustering.html#vector-quantization-with-k-means) is a good reference.
 
-We already computed *and stored* a matrix with reduced dimensions with the `runPCA()` function above.
-We will extract that from the `SingleCellExperiment` object with the `reducedDim()` function, which returns a matrix with the cells as rows that we can directly supply to the `clusterRows()` function.
+The first argument to `scran::clusterCells()` is the `SingleCellExperiment` object to cluster.
+We next want to tell this function on what values in the object (e.g. an assay or reduced dimension) to perform the clustering on.
+We'd like to use the PCA matrix for clustering, which we already computed *and stored* in our object with the `runPCA()` function above.
+We can tell the `scran::clusterCells()` function to use this PCA matrix for clustering using the argument `use.dimred = "PCA"`.
 
-When implementing `clusterRows()` below, you can use `KmeansParam()` to specify that we want k-means clustering, with the `centers` parameter to set how many clusters we will assign (`k`).
+When implementing `scran::clusterCells()` below, use the argument `BLUSPARAM = bluster::KmeansParam()` to specify that we want k-means clustering, with the `centers` parameter to set how many clusters we will assign (`k`).
 
 ```{r kmeans, solution = TRUE}
 # set the number of clusters
 
-# extract the principal components matrix
-
-# perform the clustering using `clusterRows()`
+# perform the clustering using `scran::clusterCells()`
 
 ```
 
-The `clusterRows()` function returned a vector of cluster assignments as integers, but the numerical values have no inherent meaning.
-For plotting, convert those integers into factors, so R is not tempted to treat them as a continuous variable.
+The `scran::clusterCells()` function returned a factor vector of cluster assignments, recorded as integers.
 
-Store the new factor values back into the cell information table of the original `normalized_sce` object for convenient storage and later use.
+Store these clusters back into the cell information table of the original `normalized_sce` object for convenient storage and later use.
 You can do this with the `$` notation and a column name of your choosing.
 If you are going to try different numbers of clusters, you might find it useful to include that in the column name so you can keep track of the various results.
 For example, if you used two clusters, you might use  `normalized_sce$kcluster_2` to store the results.
 
 ```{r store_kclusters, solution = TRUE}
-# save clusters in the SCE object as a factor
+# save clusters in the SCE object
 
 ```
 
@@ -231,9 +227,9 @@ Use the chunk below to explore the questions above!
 ```{r explore_kclusters_n, solution = TRUE}
 # try re-running the above steps with a different number of clusters
 
-# perform the clustering using `clusterRows()`
+# perform the clustering using `scran::clusterCells()`
 
-# save clusters in the SCE object as a factor
+# save clusters in the SCE object
 
 # plot new clustering results
 
@@ -250,7 +246,7 @@ What do the results look like if you plot with the `PCA` or `TSNE` coordinates?
 
 The other common type of clustering method for single cell data is graph-based clustering.
 
-To apply this clustering algorithm, use the same `bluster::clusterRows()` function as before, but specify `NNGraphParam()` as the second argument to tell it that we want to use a nearest-neighbor graph-based method.
+To apply this clustering algorithm, use the same `scran::clusterCells()` function as before, but specify `BLUSPARAM = bluster::NNGraphParam()` to tell it that we want to use a nearest-neighbor graph-based method.
 
 Also remember to specify `k` and the cluster detection algorithm using the `cluster.fun`.
 
diff --git a/scRNA-seq/exercise_03-celltype.Rmd b/scRNA-seq/exercise_03-celltype.Rmd
@@ -411,7 +411,7 @@ To do this, make a UMAP plot, coloring the cells by cluster assignment and facet
 In addition to looking at cell type annotation using different reference datasets, let's also try another method of cell type annotation that uses clustering to inform cell type annotations.
 We will use the approach inspired by [Baran _et al._ (2019)](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1812-2) using metacells that we tried during instruction.
 
-First we need to perform fine-scale clustering, which we can do using the `bluster::clusterRows()` function and running K-means clustering with `bluster::KmeansParam()`.
+First we need to perform fine-scale clustering, which we can do using the `scran::clusterCells()` function and running K-means clustering with the argument `BLUSPARAM = bluster::KmeansParam()`.
 With K-means clustering we have to pick how many clusters should be present in our dataset, e.g., the number of centers that will be set before performing clustering.
 During instruction we were working with a sample that contained about 8,000 cells, and we set the number of clusters (which corresponds to k-mean centers) to be 100.
 However, here we only have around 1,200 cells.
@@ -421,7 +421,7 @@ One rule of thumb is to use the square root of the total number of cells, so her
 This is not a perfect rule by any means, so feel free to try different numbers and see how that affects your results.
 
 Go ahead and identify cluster assignments for the glioblastoma sample, naming the results `kclusters`.
-Be sure to use the `PCA` as your input.
+Be sure to specify to perform clustering on the PCA matrix by specifying `use.dimred = "PCA"`.
 
 ```{r kmeans, solution = TRUE}
 # perform k-means clustering