You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: scRNA-seq/exercise_02-scrna_clustering.Rmd
+18-22Lines changed: 18 additions & 22 deletions
Original file line number
Diff line number
Diff line change
@@ -28,9 +28,6 @@ library(ggplot2)
28
28
library(scater)
29
29
library(scran)
30
30
31
-
# clustering tools
32
-
library(bluster)
33
-
34
31
# File path to plots directory
35
32
plots_dir <- "plots"
36
33
@@ -141,28 +138,28 @@ Also try adding some color with the `color_by` argument, using the number of gen
141
138
## Cell clustering
142
139
143
140
144
-
145
141
When we performed dimensionality reduction on our data above, we could see visually that the cells tended cluster together into groups.
146
142
Let's try identifying distinct groups of cells with similar patterns of gene expression that we can assign labels to.
147
143
148
144
As noted in the `05-clustering_markers_scRNA.Rmd` instruction notebook, there are a number of methods to identify clusters and assign cells to those in multidimensional data like the single cell data we have.
149
145
150
-
We can use the function `clusterRows()` from the `bluster` package to facilitate applying many of those algorithms.
146
+
We can use the function `scran::clusterCells()` to facilitate applying many of those algorithms.
147
+
This function uses the package `bluster` to perform clustering on cells in a `SingleCellExperiment` object.
151
148
152
-
To specify the clustering algorithm and the parameters specific to that algorithm to `clusterRows()`, we will provide it an argument with one of a set of `*Params()` functions, which are outlined below.
149
+
To specify the clustering algorithm and the parameters for that algorithm to `scran::clusterCells()`, we will provide it an argument with one of a set of `*Params()` functions from the `bluster` package, which are outlined below.
153
150
Note that there are other algorithms that we will not discuss, but can be explored in the [Flexible clustering for Bioconductor vignette](https://bioconductor.org/packages/3.19/bioc/vignettes/bluster/inst/doc/clusterRows.html).
154
151
155
-
-`KmeansParam()` will apply the k-means clustering algorithm (see also `?kmeans`).
152
+
-`bluster::KmeansParam()` will apply the k-means clustering algorithm (see also `?kmeans`).
156
153
Some of the more important parameters are:
157
154
-`centers`: the number of clusters to be assigned
158
155
-`nstart`: Since the final k-means clusters may depend on the random clusters chosen at the start, we might want to repeat the procedure a number of times and choose the best output. This argument tells how many times to repeat the clustering (the default is 1).
159
156
160
-
-`NNGraphParam()` will apply community detection algorithms on a nearest-neighbor (NN) graph.
157
+
-`bluster::NNGraphParam()` will apply community detection algorithms on a nearest-neighbor (NN) graph.
161
158
Some of the more important parameters are:
162
159
-`k`: the number of neighbors for each cell to use when constructing the graph
163
160
-`type`: how the neighbor graph is weighted.
164
161
The options are `rank` (default), `number` and `jaccard`, with the last of those being the default in `Seurat`.
165
-
For more details see `?makeSNNGraph`
162
+
For more details see `?bluster::makeSNNGraph`
166
163
-`cluster.fun`: Which cluster detection algorithm to use.
167
164
The options are many, but common choices are `walktrap` (default), `louvain` (used by `Seurat`), and `leiden`.
168
165
@@ -182,30 +179,29 @@ That is a hard question and for now the answer is up to you!
182
179
However, for an intuitive visualization of the general k-means method, you might find [this StatQuest video](https://www.youtube.com/watch?v=4b5d3muPQmA) useful.
183
180
For more discussion of the method in a single-cell context, including some tips on choosing `k`, the [Orchestrating Single-Cell Analysis book chapter on k-means](https://bioconductor.org/books/3.19/OSCA.basic/clustering.html#vector-quantization-with-k-means) is a good reference.
184
181
185
-
We already computed *and stored* a matrix with reduced dimensions with the `runPCA()` function above.
186
-
We will extract that from the `SingleCellExperiment` object with the `reducedDim()` function, which returns a matrix with the cells as rows that we can directly supply to the `clusterRows()` function.
182
+
The first argument to `scran::clusterCells()` is the `SingleCellExperiment` object to cluster.
183
+
We next want to tell this function on what values in the object (e.g. an assay or reduced dimension) to perform the clustering on.
184
+
We'd like to use the PCA matrix for clustering, which we already computed *and stored* in our object with the `runPCA()` function above.
185
+
We can tell the `scran::clusterCells()` function to use this PCA matrix for clustering using the argument `use.dimred = "PCA"`.
187
186
188
-
When implementing `clusterRows()` below, you can use `KmeansParam()` to specify that we want k-means clustering, with the `centers` parameter to set how many clusters we will assign (`k`).
187
+
When implementing `scran::clusterCells()` below, use the argument `BLUSPARAM = bluster::KmeansParam()` to specify that we want k-means clustering, with the `centers` parameter to set how many clusters we will assign (`k`).
189
188
190
189
```{r kmeans, solution = TRUE}
191
190
# set the number of clusters
192
191
193
-
# extract the principal components matrix
194
-
195
-
# perform the clustering using `clusterRows()`
192
+
# perform the clustering using `scran::clusterCells()`
196
193
197
194
```
198
195
199
-
The `clusterRows()` function returned a vector of cluster assignments as integers, but the numerical values have no inherent meaning.
200
-
For plotting, convert those integers into factors, so R is not tempted to treat them as a continuous variable.
196
+
The `scran::clusterCells()` function returned a factor vector of cluster assignments, recorded as integers.
201
197
202
-
Store the new factor values back into the cell information table of the original `normalized_sce` object for convenient storage and later use.
198
+
Store these clusters back into the cell information table of the original `normalized_sce` object for convenient storage and later use.
203
199
You can do this with the `$` notation and a column name of your choosing.
204
200
If you are going to try different numbers of clusters, you might find it useful to include that in the column name so you can keep track of the various results.
205
201
For example, if you used two clusters, you might use `normalized_sce$kcluster_2` to store the results.
206
202
207
203
```{r store_kclusters, solution = TRUE}
208
-
# save clusters in the SCE object as a factor
204
+
# save clusters in the SCE object
209
205
210
206
```
211
207
@@ -231,9 +227,9 @@ Use the chunk below to explore the questions above!
231
227
```{r explore_kclusters_n, solution = TRUE}
232
228
# try re-running the above steps with a different number of clusters
233
229
234
-
# perform the clustering using `clusterRows()`
230
+
# perform the clustering using `scran::clusterCells()`
235
231
236
-
# save clusters in the SCE object as a factor
232
+
# save clusters in the SCE object
237
233
238
234
# plot new clustering results
239
235
@@ -250,7 +246,7 @@ What do the results look like if you plot with the `PCA` or `TSNE` coordinates?
250
246
251
247
The other common type of clustering method for single cell data is graph-based clustering.
252
248
253
-
To apply this clustering algorithm, use the same `bluster::clusterRows()` function as before, but specify `NNGraphParam()` as the second argument to tell it that we want to use a nearest-neighbor graph-based method.
249
+
To apply this clustering algorithm, use the same `scran::clusterCells()` function as before, but specify `BLUSPARAM = bluster::NNGraphParam()` to tell it that we want to use a nearest-neighbor graph-based method.
254
250
255
251
Also remember to specify `k` and the cluster detection algorithm using the `cluster.fun`.
Copy file name to clipboardExpand all lines: scRNA-seq/exercise_03-celltype.Rmd
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -411,7 +411,7 @@ To do this, make a UMAP plot, coloring the cells by cluster assignment and facet
411
411
In addition to looking at cell type annotation using different reference datasets, let's also try another method of cell type annotation that uses clustering to inform cell type annotations.
412
412
We will use the approach inspired by [Baran _et al._ (2019)](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1812-2) using metacells that we tried during instruction.
413
413
414
-
First we need to perform fine-scale clustering, which we can do using the `bluster::clusterRows()` function and running K-means clustering with `bluster::KmeansParam()`.
414
+
First we need to perform fine-scale clustering, which we can do using the `scran::clusterCells()` function and running K-means clustering with the argument `BLUSPARAM = bluster::KmeansParam()`.
415
415
With K-means clustering we have to pick how many clusters should be present in our dataset, e.g., the number of centers that will be set before performing clustering.
416
416
During instruction we were working with a sample that contained about 8,000 cells, and we set the number of clusters (which corresponds to k-mean centers) to be 100.
417
417
However, here we only have around 1,200 cells.
@@ -421,7 +421,7 @@ One rule of thumb is to use the square root of the total number of cells, so her
421
421
This is not a perfect rule by any means, so feel free to try different numbers and see how that affects your results.
422
422
423
423
Go ahead and identify cluster assignments for the glioblastoma sample, naming the results `kclusters`.
424
-
Be sure to use the `PCA` as your input.
424
+
Be sure to specify to perform clustering on the PCA matrix by specifying `use.dimred = "PCA"`.
0 commit comments