-
Notifications
You must be signed in to change notification settings - Fork 3
Sleuth #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
markmcdowall
wants to merge
112
commits into
master
Choose a base branch
from
sleuth
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Sleuth #11
Changes from 94 commits
Commits
Show all changes
112 commits
Select commit
Hold shift + click to select a range
50491ae
Minor syntax fix
markmcdowall 26ccd3d
Added in the requirement for the boostrap sampling needed for the Sle…
markmcdowall 7542808
Added in the requirement for the boostrap sampling needed for the Sle…
markmcdowall 88dc548
Initial sleuth tool, structure based on the idear tool as both requir…
markmcdowall b31cb40
Added an ADR description about the branch
markmcdowall 7205ef6
Initial R script to run sleuth
markmcdowall 702a624
Initial pass to complete the logic for the sleuth tool
markmcdowall b6bb5a3
Tidied up some of the code inline with flake8 recommendations
markmcdowall a6704e3
Initial commit of test code for the sleuth tool
markmcdowall 103cda8
Tweaks so that the tool completes using pytest
markmcdowall 1b15c1e
Changed the test set tag so that is is not run in Travis until a smal…
markmcdowall d502418
Updated the code and removed lambda functions
markmcdowall ccca8d7
Updated the scripts to indicate the return of only a single tar.gz fi…
markmcdowall ef9ec48
Tests for generating the required data for the sleuth tests
markmcdowall bdd683a
Changes to how the tar files are generated
markmcdowall fa70c93
Changes to reflect how the results are tarred into a single archive
markmcdowall 8ad010f
Changed the number of cores to use to 1 during testing
markmcdowall f52bae3
Tidying based on Flake8 feedback
markmcdowall 8ec24d7
Small tidying up of the code to removed unused imports
markmcdowall e2910f9
Initial commit of the Sleuth pipeline for differential gene expressio…
markmcdowall 2cdd3ec
Added the Sleuth tool to the tests test_toolchain.py for easy running…
markmcdowall 7daca9f
Added sleuth pipeline and tools into the docs
markmcdowall 311dbed
Installation script for Sleuth
markmcdowall 7f4da1b
Updated the installation instructions to include Sleuth
markmcdowall 4935218
Added the sleuth script to the Travis instal as a test
markmcdowall 971d19b
Removing old documentation
markmcdowall a8cd4d5
Test for the Sleuth pipeline
markmcdowall 1e7de87
Added the sleuth tests to the RNA-seq pipeline
markmcdowall c584367
Updated the FASTQ read selector so that a proportion of the reads can…
markmcdowall e8a1445
Fix for handling the proportion value
markmcdowall 598a93c
Description about the generation of the Sleuth datasets
markmcdowall ffeddde
Added the Sleuth dataset docs to the datasets index
markmcdowall eb8e359
Changes to fix the Sphinx build
markmcdowall aefe68f
Rscripts should only be installed in the code matrix for Travis
markmcdowall 1592701
Changed the FASTQ files so that they are gzipped
markmcdowall 6b86a71
Test datasets for the Sleuth pipeline
markmcdowall cb3d4b3
Fixes to the naming in the tests for the pipelines so that everything…
markmcdowall 5310ae5
Included optparse in the R installation set
markmcdowall d704755
Trigger to ensure that kallisto_bootstrap_param is in the configurati…
markmcdowall 32efe92
Updated the version of BedTools to 2.27.1
markmcdowall 023e841
Modified the script so that it can handle any number of conditions
markmcdowall fc59975
Ability to handle numberous conditions when generating the Sleuth con…
markmcdowall 3576363
Modified the data structure for defining the sleuth configuration
markmcdowall 3029baa
Updates based on pylint about naming
markmcdowall c2b45a9
Fix for the RNA-seq pipeline
markmcdowall e2eca94
Tidied up some legacy code
markmcdowall 299bb1d
Fix to the test scripts for the Sleuth pipeline
markmcdowall bb33e6e
Tidied up the legacy options
markmcdowall adb6420
Changes so that the code runs in COMPSs with the new single file output
markmcdowall 066a37e
Config and input files for the RNA-seq pipeline to generate the requi…
markmcdowall 8433e05
Initial commit -f the sleuth JSON pipeline files
markmcdowall be4f941
Merge branch 'sleuth' of https://github.com/Multiscale-Genomics/mg-pr…
markmcdowall afe89fc
Extract the file names correctly
markmcdowall f845080
Include the bootstrap param in the tests
markmcdowall 65e62fa
Modified the generation of the sources so that the output JSON includ…
markmcdowall 0fd3671
Added information to the ADR about the proposed integration of visual…
markmcdowall d3c8404
Removal of erroneous characters in line
markmcdowall bdd4756
Initial visualisation script (PCA)
markmcdowall 8074d6c
Generate PCA pngs
markmcdowall 7b76cba
Added user defined parameters for the location of the table for the s…
markmcdowall 49b395d
Added visuals for heatmaps and initial work for executing the scripts
markmcdowall 1ae33ee
Modifications so that the code is mostly able to run as part of pytest
markmcdowall 9d886c1
Still have issues with the R scripts, but the python modules are func…
markmcdowall e37eb6e
Images now getting generated for attribs
markmcdowall a169403
Fix to make sure that the meta data and output files were itemised
markmcdowall dc3d403
Merge branch 'master' into sleuth
markmcdowall 27a92c2
Initial commit of the sleuth Tool JSON
markmcdowall 92cd4a8
Updated the file paths and arguments
markmcdowall 019e44c
Merge branch 'sleuth' of https://github.com/Multiscale-Genomics/mg-pr…
markmcdowall 77b0a8d
Merge branch 'master' into sleuth
markmcdowall a33c84a
Fixed formatting error from merge procedure
markmcdowall 4cddc5b
Fixed formatting error from merge procedure
markmcdowall 2d8e0cc
Merge branch 'sleuth' of https://github.com/Multiscale-Genomics/mg-pr…
markmcdowall b20fbce
Fixed formatting error from merge procedure
markmcdowall 0feaa2f
Doc fixes for sphinx
markmcdowall a7fbac0
Correction to the docs describing the Sleuth pipeline
markmcdowall 2576dd1
Fix for toolchain test for sleuth
markmcdowall ee13687
Changed the mg-tool-api branch as the dummy.py is now on master
markmcdowall 8b8603d
Merge branch 'master' into sleuth
markmcdowall 38abbb3
Removed legacy comments
markmcdowall 4b76330
Merge branch 'sleuth' of https://github.com/Multiscale-Genomics/mg-pr…
markmcdowall c5b9559
Moved the RNA-seq and Sleuth to a spearate travis testset so that the…
markmcdowall d6263eb
Change for permissions and removal or RNA-seq pipeline tests from ori…
markmcdowall 40603ec
Updated travis to include the install_r_code script for installing Ka…
markmcdowall f1709e4
Modified the permissions for the kallisto installation script
markmcdowall 012a345
Added FASTQC to the installation for the R code pipelines
markmcdowall 7a93bff
Removed unused installation procedures
markmcdowall 4b724d3
Merge branch 'master' into sleuth
markmcdowall 5f99edc
Merge branch 'master' into sleuth
markmcdowall 6877b91
Merge branch 'sleuth' of https://github.com/Multiscale-Genomics/mg-pr…
markmcdowall b60bd4d
Adding the R directory to the list of directories to cache to reduce …
markmcdowall 55668e1
Fixed typo
markmcdowall 46f349d
Fixes for the installation of tools
markmcdowall caa1c37
Merge branch 'master' into sleuth
markmcdowall 344d343
Included changes based on feedback from RF
markmcdowall 910480c
Fixed data type syntax error
markmcdowall feda998
Added newline for consistency
markmcdowall f91b2ef
Added extra description for the file
markmcdowall 05c4b3c
Merge branch 'master' into sleuth
markmcdowall 1a02d16
Merge branch 'master' into sleuth
markmcdowall 6b28f99
Merge branch 'master' into sleuth
markmcdowall 54fcf21
Added all the metadata from the fq to the kallisto tar results file s…
markmcdowall 16c4cb0
Initial changes to handle multiple files handed over from the VRE
markmcdowall 3af8cf9
Fix for the use of kallisto_config passed as an argument
markmcdowall 92dd3d0
Modified the extraction of the condition data if it is stored inside …
markmcdowall 6de22a0
Minor change to the tool def
markmcdowall 56e4832
Merge branch 'master' into sleuth
markmcdowall 858417a
Merge branch 'sleuth' of https://github.com/Multiscale-Genomics/mg-pr…
markmcdowall 29d2a79
Modified the compression to use the common function
markmcdowall c489837
Updated the R repo
markmcdowall 7a0382a
Better handling of the image archiving
markmcdowall 145d667
Merge branch 'master' into sleuth
markmcdowall File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,112 @@ | ||
| .. See the NOTICE file distributed with this work for additional information | ||
| regarding copyright ownership. | ||
|
|
||
| Licensed under the Apache License, Version 2.0 (the "License"); | ||
| you may not use this file except in compliance with the License. | ||
| You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, software | ||
| distributed under the License is distributed on an "AS IS" BASIS, | ||
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| See the License for the specific language governing permissions and | ||
| limitations under the License. | ||
|
|
||
| Sleuth Test Data | ||
| ================ | ||
|
|
||
| Test Data | ||
| --------- | ||
|
|
||
| Dataset | ||
| ^^^^^^^ | ||
|
|
||
| +------------+--------------------------------------------------------------+ | ||
| | Stable IDs | ERR030856, ERR030857, ERR030858, ERR030872, ERR030903 | | ||
| +------------+--------------------------------------------------------------+ | ||
| | Project | `PRJEB2445 <https://www.ebi.ac.uk/ena/data/view/PRJEB2445>`_ | | ||
| +------------+--------------------------------------------------------------+ | ||
|
|
||
| Genome | ||
| ^^^^^^ | ||
|
|
||
| CDNA was downloaded from `ensembl 92 <http://ftp.ensembl.org/pub/release-92/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz>`_ | ||
|
|
||
| +-------------+--------+ | ||
| | Assembly | GRCh38 | | ||
| +-------------+--------+ | ||
| | Transcripts | 1000 | | ||
| +-------------+--------+ | ||
|
|
||
| Method | ||
| ------ | ||
| The full dataset was downloaded from ENA aligned to the cDNA using kallisto producing a pseudo alignment bam file. Sleuth was used to calculate the most significant hits of which the top 1000 were picked. These were used to select the matching FASTQ reads from the pseudo alignment files | ||
|
|
||
|
|
||
| hiseq_info.txt file: | ||
|
|
||
| .. code-block:: none | ||
| :linenos: | ||
|
|
||
| ERR030856\tcontrol | ||
| ERR030857\tcontrol | ||
| ERR030858\tcontrol | ||
| ERR030872\tthyroid | ||
| ERR030903\tthyroid | ||
|
|
||
| .. code-block:: R | ||
| :linenos: | ||
|
|
||
| library("sleuth") | ||
| sample_id <- dir(file.path("data", "results")) | ||
| kal_dirs <- file.path("data", "results", sample_id, "kallisto") | ||
|
|
||
| s2c <- read.table(file.path("data", "hiseq_info.txt"), header = TRUE, stringsAsFactors=FALSE) | ||
| s2c <- dplyr::select(s2c, sample, condition) | ||
| s2c <- dplyr::mutate(s2c, path = kal_dirs) | ||
|
|
||
| so <- sleuth_prep(s2c, extra_bootstrap_summary = TRUE, num_cores = 1) | ||
| so <- sleuth_fit(so, ~condition, 'full') | ||
| so <- sleuth_fit(so, ~1, 'reduced') | ||
| so <- sleuth_lrt(so, 'reduced', 'full') | ||
|
|
||
| sleuth_table <- sleuth_results(so, 'reduced:full', 'lrt', show_all = FALSE) | ||
| sleuth_significant <- dplyr::filter(sleuth_table, qval <= 0.05) | ||
|
|
||
| # Generate a set of transcripts to use for code testing | ||
| sample(sleuth_significant$target_id, 1000) | ||
|
|
||
|
|
||
| .. code-block:: none | ||
| :linenos: | ||
|
|
||
| # Kallisto Quantification | ||
| kallisto quant -i GRCh38.cdna.fasta.idx -o ERR030856 --pseudobam --single -l 100 -s 0.01 ERR030856/ERR030856.fastq > ERR030856/ERR030856.sam | ||
| kallisto quant -i GRCh38.cdna.fasta.idx -o ERR030857 --pseudobam --single -l 100 -s 0.01 ERR030857/ERR030857.fastq > ERR030857/ERR030857.sam | ||
| kallisto quant -i GRCh38.cdna.fasta.idx -o ERR030858 --pseudobam --single -l 100 -s 0.01 ERR030858/ERR030858.fastq > ERR030858/ERR030858.sam | ||
| kallisto quant -i GRCh38.cdna.fasta.idx -o ERR030903 --pseudobam --single -l 75 -s 0.0133333 ERR030903/ERR030903.fastq > ERR030903/ERR030903.sam | ||
|
|
||
|
|
||
| .. code-block:: none | ||
| :linenos: | ||
|
|
||
| # Extract the FASTQ read IDs for the selected transcripts | ||
| grep -f sleuth_sample_transcripts.txt ERR030872/ERR030872.sam | tr "\t" "~" | cut -d"~" -f1 | grep -v @ > ERR030872/ERR030872.reads | ||
| grep -f sleuth_sample_transcripts.txt ERR030903/ERR030903.sam | tr "\t" "~" | cut -d"~" -f1 | grep -v @ > ERR030903/ERR030903.reads | ||
| grep -f sleuth_sample_transcripts.txt ERR030856/ERR030856.sam | tr "\t" "~" | cut -d"~" -f1 | grep -v @ > ERR030856/ERR030856.reads | ||
| grep -f sleuth_sample_transcripts.txt ERR030857/ERR030857.sam | tr "\t" "~" | cut -d"~" -f1 | grep -v @ > ERR030857/ERR030857.reads | ||
| grep -f sleuth_sample_transcripts.txt ERR030858/ERR030858.sam | tr "\t" "~" | cut -d"~" -f1 | grep -v @ > ERR030858/ERR030858.reads | ||
|
|
||
|
|
||
| .. code-block:: none | ||
| :linenos: | ||
|
|
||
| # Extract the original reads from teh FASTQ files | ||
| python scripts/ExtractRowsFromFASTQs.py --input_1 ERR030856/ERR030856.fastq --rows ERR030856/ERR030856.reads --prop 0.1 --output_tag subset | ||
| python scripts/ExtractRowsFromFASTQs.py --input_1 ERR030857/ERR030857.fastq --rows ERR030857/ERR030857.reads --prop 0.1 --output_tag subset | ||
| python scripts/ExtractRowsFromFASTQs.py --input_1 ERR030858/ERR030858.fastq --rows ERR030858/ERR030858.reads --prop 0.1 --output_tag subset | ||
| python scripts/ExtractRowsFromFASTQs.py --input_1 ERR030872/ERR030872_1.fastq --input_2 ERR030872/ERR030872_2.fastq --rows ERR030872/ERR030872.reads --prop 0.1 --output_tag subset | ||
| python scripts/ExtractRowsFromFASTQs.py --input_1 ERR030903/ERR030903.fastq --rows ERR030903/ERR030903.reads --prop 0.1 --output_tag subset | ||
|
|
||
| Due to the number of reads that match to the transcripts, only 1% have been kept for code testing | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the* FASTQ