|
39 | 39 | "\n",
|
40 | 40 | "<figure>\n",
|
41 | 41 | "<img src=\"../../images/epigenetic-mech.jpeg\" width=\"700\" height=\"500\">\n",
|
42 |
| - "<figcaption align = \"center\"> <b> Fig 1: Affect of epigenetic mechanisms on health. [1] </b> </figcaption>\n", |
| 42 | + "<figcaption align = \"center\"> <b> Fig 1: Effect of epigenetic mechanisms on health. [1] </b> </figcaption>\n", |
43 | 43 | " \n",
|
44 | 44 | "</figure>\n",
|
45 | 45 | " \n",
|
|
94 | 94 | " \n",
|
95 | 95 | "This figure represents the analysis architecture followed in this module. The module has been designed according to the resources and the availability of data. The analysis steps represent the pipeline that can be implemented using the Nextflow nf-core/methylseq module. In this figure, the analysis steps to perform methyl seq are shown. Now, there are two different workflows that can be followed to implement this pipeline. The first one is Bismark workflow, where it shows all the tools which can be used for each step of the analysis. We have a similar tools list for each step for the bwa-meth workflow. Both of them are very popular workflows to implement methylseq pipeline.\n",
|
96 | 96 | " \n",
|
97 |
| - "The sample command to run nf-core methylseq pipeline to generate quality control reports and extract methylation call and coverage file is provided below. #### This step is <u>optional</u> as it is the preprocessing step to let you experience generating your own methylation coverage file. To save on computational and storage resources, we have already provided the methylation coverage file you will use in the down processing analysis in step 3. \n", |
| 97 | + "The sample command to run nf-core methylseq pipeline to generate quality control reports and extract methylation call and coverage file is provided below. #### This step is <u>optional</u> as it is the preprocessing step to let you experience generating your own methylation coverage file. To save on computational and storage resources, we have already provided the methylation coverage file you will use in the downstream processing analysis in step 3. \n", |
98 | 98 | " \n",
|
99 | 99 | "If you choose to generate your own methylation coverage file then refer to the instructions outlined in the RNAseq submodule, and refer to the nf-core [methylseq](https://nf-co.re/methylseq). Again, you will need to modify the config file to include your bucket and project ID. "
|
100 | 100 | ]
|
|
153 | 153 | "id": "38c8751a",
|
154 | 154 | "metadata": {},
|
155 | 155 | "source": [
|
156 |
| - "Run the following to create a Kernel with all required packaged installed. It will take about 10 minutes to install packages and create a kernel." |
| 156 | + "Run the following to create a Kernel with all required packages installed. It will take about 10 minutes to install packages and create a kernel." |
157 | 157 | ]
|
158 | 158 | },
|
159 | 159 | {
|
|
189 | 189 | "[](https://console.aws.amazon.com/cloudformation/home?#/stacks/new?stackName=aws-batch-nigms&templateURL=https://nigms-sandbox.s3.us-east-1.amazonaws.com/cf-templates/AWSBatch_template.yaml)\n",
|
190 | 190 | "\n",
|
191 | 191 | "\n",
|
192 |
| - "Before begining this tutorial, if you do not have required roles, policies, permissions or compute environment and would like to **manually** set those up please click [here](https://github.com/NIGMS/NIGMS-Sandbox/blob/main/docs/AWS-Batch-Setup.md) to set that up." |
| 192 | + "Before beginning this tutorial, if you do not have required roles, policies, permissions or compute environment and would like to **manually** set those up please click [here](https://github.com/NIGMS/NIGMS-Sandbox/blob/main/docs/AWS-Batch-Setup.md) to set that up." |
193 | 193 | ]
|
194 | 194 | },
|
195 | 195 | {
|
|
207 | 207 | "metadata": {},
|
208 | 208 | "outputs": [],
|
209 | 209 | "source": [
|
210 |
| - "#Install nexflow, make it exceutable, and update it\n", |
| 210 | + "#Install nexflow, make it executable, and update it\n", |
211 | 211 | "system('curl https://get.nextflow.io | bash' , intern=TRUE)\n",
|
212 | 212 | "system('chmod +x nextflow' , intern=TRUE)\n",
|
213 | 213 | "system('./nextflow self-update' , intern=TRUE)"
|
|
245 | 245 | "source": [
|
246 | 246 | "<div class=\"alert alert-block alert-info\">\n",
|
247 | 247 | " <i class=\"fa fa-lightbulb-o\" aria-hidden=\"true\"></i>\n",
|
248 |
| - " <b>Tip: </b> If you don't immediately see a output on your screen check your output directory you have pointed to in your config file to insure that Nextflow is running. You should see some output directories/files.\n", |
| 248 | + " <b>Tip: </b> If you don't immediately see a output on your screen check your output directory you have pointed to in your config file to ensure that Nextflow is running. You should see some output directories/files.\n", |
249 | 249 | "</div>"
|
250 | 250 | ]
|
251 | 251 | },
|
|
334 | 334 | "source": [
|
335 | 335 | "<div class=\"alert alert-block alert-success\">\n",
|
336 | 336 | " <i class=\"fa fa-hand-paper-o\" aria-hidden=\"true\"></i>\n",
|
337 |
| - " <b>Note: </b> If you've used Nextflow to produce your methylation coverage files and would like to use them for the down processing analysis instead of the test data provided enter your own files into the two previous code cells above with by copying them from the <b>bismark</b> subdirectory within your Nextflow outputs directory.\n", |
| 337 | + " <b>Note: </b> If you've used Nextflow to produce your methylation coverage files and would like to use them for the downstream processing analysis instead of the test data provided enter your own files into the two previous code cells above by copying them from the <b>bismark</b> subdirectory within your Nextflow outputs directory.\n", |
338 | 338 | "</div>"
|
339 | 339 | ]
|
340 | 340 | },
|
|
385 | 385 | "metadata": {},
|
386 | 386 | "source": [
|
387 | 387 | "### Filter Step\n",
|
388 |
| - "Filtering samples based on coverage can often be useful. Specifically, if samples have overamplification or PCR bias, it can be useful to discard bases that have a very high read coverage. Bases with a very low read coverage should also be discarded because they tend to produce statistics that are unreliable and unstable in the downstream analyses. The code shown below filters a methylRawList and discards bases that have covereage below 10 reads, which was already done when the files were read in. Additionally, the code below discards bases with more than 99.9th percentile coverage in each sample." |
| 388 | + "Filtering samples based on coverage can often be useful. Specifically, if samples have overamplification or PCR bias, it can be useful to discard bases that have a very high read coverage. Bases with a very low read coverage should also be discarded because they tend to produce statistics that are unreliable and unstable in the downstream analyses. The code shown below filters a methylRawList and discards bases that have coverage below 10 reads, which was already done when the files were read in. Additionally, the code below discards bases with more than 99.9th percentile coverage in each sample." |
389 | 389 | ]
|
390 | 390 | },
|
391 | 391 | {
|
|
574 | 574 | "source": [
|
575 | 575 | "### <span> Differential Methylation </span>\n",
|
576 | 576 | "### Single CpG Sites\n",
|
577 |
| - "Once we have confirmed that the basic statistics and data structures of the samples are reasonable, we can proceed to differential methylation. Differential DNA methylation is usually calculated by comparing the proportion of methylated Cs in a test sample relative to a control. The Fisher's Exact Test and similar methods can be applied when there are no replicates for the test and control cases. This can allow us to make simple comparisons between the pairs of samples such as the test and control. When replicates are present, regression based methods are typically used to model the methylation levels relative to the sample groups and variation between the replicates. Regression methods also have another additional advantage over the use of the Fisher's Exact test in that they all for the inclusion of sample specific covariates (categorical or continuous) as well as the ability to adjust for confounding variables. \n", |
| 577 | + "Once we have confirmed that the basic statistics and data structures of the samples are reasonable, we can proceed to differential methylation. Differential DNA methylation is usually calculated by comparing the proportion of methylated Cs in a test sample relative to a control. The Fisher's Exact Test and similar methods can be applied when there are no replicates for the test and control cases. This can allow us to make simple comparisons between the pairs of samples such as the test and control. When replicates are present, regression based methods are typically used to model the methylation levels relative to the sample groups and variation between the replicates. Regression methods also have another additional advantage over the use of the Fisher's Exact test in that they allow for the inclusion of sample specific covariates (categorical or continuous) as well as the ability to adjust for confounding variables. \n", |
578 | 578 | "\n",
|
579 | 579 | "There are three options provided to get the differential methylation results namely Fisher’s Exact Test, Betabinomial Distribution Based Test, and Logistic Regression Based Test as you will see below. Only the Fisher’s exact test and the Logistic Regression based test will be explored. If you plan to use Betabinomial Distribution Based Test or compare the results of all three types of tests, the code can be uncommented. "
|
580 | 580 | ]
|
|
649 | 649 | "metadata": {},
|
650 | 650 | "source": [
|
651 | 651 | "### Optional: Betabinomial-Distribution-Based Tests\n",
|
652 |
| - "The beta-binominal model for calculating the differential methylation can be accessed through the code below. This accounts for both sampling and epigenetic variablity, and is useful for better modeling of the variance. This model follows the binominal distribution of the number of reads which is similar to how logistic regression works. However, the beta distribution can have varying methylation proportions across samples.\n", |
| 652 | + "The beta-binominal model for calculating the differential methylation can be accessed through the code below. This accounts for both sampling and epigenetic variability, and is useful for better modeling of the variance. This model follows the binominal distribution of the number of reads which is similar to how logistic regression works. However, the beta distribution can have varying methylation proportions across samples.\n", |
653 | 653 | "\n",
|
654 | 654 | "If you plan to use Betabinomial Distribution Based Test or compare the results of all three types of tests, the code can be uncommented. "
|
655 | 655 | ]
|
|
992 | 992 | "mimetype": "text/x-r-source",
|
993 | 993 | "name": "R",
|
994 | 994 | "pygments_lexer": "r",
|
995 |
| - "version": "4.2.2" |
| 995 | + "version": "4.3.3" |
996 | 996 | }
|
997 | 997 | },
|
998 | 998 | "nbformat": 4,
|
|
0 commit comments