Skip to content
Open
Show file tree
Hide file tree
Changes from 94 commits
Commits
Show all changes
112 commits
Select commit Hold shift + click to select a range
50491ae
Minor syntax fix
markmcdowall May 11, 2018
26ccd3d
Added in the requirement for the boostrap sampling needed for the Sle…
markmcdowall May 11, 2018
7542808
Added in the requirement for the boostrap sampling needed for the Sle…
markmcdowall May 11, 2018
88dc548
Initial sleuth tool, structure based on the idear tool as both requir…
markmcdowall May 11, 2018
b31cb40
Added an ADR description about the branch
markmcdowall May 11, 2018
7205ef6
Initial R script to run sleuth
markmcdowall May 14, 2018
702a624
Initial pass to complete the logic for the sleuth tool
markmcdowall May 14, 2018
b6bb5a3
Tidied up some of the code inline with flake8 recommendations
markmcdowall May 14, 2018
a6704e3
Initial commit of test code for the sleuth tool
markmcdowall May 14, 2018
103cda8
Tweaks so that the tool completes using pytest
markmcdowall May 14, 2018
1b15c1e
Changed the test set tag so that is is not run in Travis until a smal…
markmcdowall May 14, 2018
d502418
Updated the code and removed lambda functions
markmcdowall May 14, 2018
ccca8d7
Updated the scripts to indicate the return of only a single tar.gz fi…
markmcdowall May 14, 2018
ef9ec48
Tests for generating the required data for the sleuth tests
markmcdowall May 15, 2018
bdd683a
Changes to how the tar files are generated
markmcdowall May 15, 2018
fa70c93
Changes to reflect how the results are tarred into a single archive
markmcdowall May 15, 2018
8ad010f
Changed the number of cores to use to 1 during testing
markmcdowall May 15, 2018
f52bae3
Tidying based on Flake8 feedback
markmcdowall May 15, 2018
8ec24d7
Small tidying up of the code to removed unused imports
markmcdowall May 15, 2018
e2910f9
Initial commit of the Sleuth pipeline for differential gene expressio…
markmcdowall May 15, 2018
2cdd3ec
Added the Sleuth tool to the tests test_toolchain.py for easy running…
markmcdowall May 15, 2018
7daca9f
Added sleuth pipeline and tools into the docs
markmcdowall May 15, 2018
311dbed
Installation script for Sleuth
markmcdowall May 15, 2018
7f4da1b
Updated the installation instructions to include Sleuth
markmcdowall May 15, 2018
4935218
Added the sleuth script to the Travis instal as a test
markmcdowall May 15, 2018
971d19b
Removing old documentation
markmcdowall May 15, 2018
a8cd4d5
Test for the Sleuth pipeline
markmcdowall May 15, 2018
1e7de87
Added the sleuth tests to the RNA-seq pipeline
markmcdowall May 15, 2018
c584367
Updated the FASTQ read selector so that a proportion of the reads can…
markmcdowall May 15, 2018
e8a1445
Fix for handling the proportion value
markmcdowall May 15, 2018
598a93c
Description about the generation of the Sleuth datasets
markmcdowall May 15, 2018
ffeddde
Added the Sleuth dataset docs to the datasets index
markmcdowall May 15, 2018
eb8e359
Changes to fix the Sphinx build
markmcdowall May 15, 2018
aefe68f
Rscripts should only be installed in the code matrix for Travis
markmcdowall May 15, 2018
1592701
Changed the FASTQ files so that they are gzipped
markmcdowall May 15, 2018
6b86a71
Test datasets for the Sleuth pipeline
markmcdowall May 16, 2018
cb3d4b3
Fixes to the naming in the tests for the pipelines so that everything…
markmcdowall May 16, 2018
5310ae5
Included optparse in the R installation set
markmcdowall May 16, 2018
d704755
Trigger to ensure that kallisto_bootstrap_param is in the configurati…
markmcdowall May 16, 2018
32efe92
Updated the version of BedTools to 2.27.1
markmcdowall May 17, 2018
023e841
Modified the script so that it can handle any number of conditions
markmcdowall May 17, 2018
fc59975
Ability to handle numberous conditions when generating the Sleuth con…
markmcdowall May 17, 2018
3576363
Modified the data structure for defining the sleuth configuration
markmcdowall May 17, 2018
3029baa
Updates based on pylint about naming
markmcdowall May 17, 2018
c2b45a9
Fix for the RNA-seq pipeline
markmcdowall May 17, 2018
e2eca94
Tidied up some legacy code
markmcdowall May 17, 2018
299bb1d
Fix to the test scripts for the Sleuth pipeline
markmcdowall May 17, 2018
bb33e6e
Tidied up the legacy options
markmcdowall May 17, 2018
adb6420
Changes so that the code runs in COMPSs with the new single file output
markmcdowall May 17, 2018
066a37e
Config and input files for the RNA-seq pipeline to generate the requi…
markmcdowall May 17, 2018
8433e05
Initial commit -f the sleuth JSON pipeline files
markmcdowall May 17, 2018
be4f941
Merge branch 'sleuth' of https://github.com/Multiscale-Genomics/mg-pr…
markmcdowall May 17, 2018
afe89fc
Extract the file names correctly
markmcdowall May 21, 2018
f845080
Include the bootstrap param in the tests
markmcdowall May 21, 2018
65e62fa
Modified the generation of the sources so that the output JSON includ…
markmcdowall May 21, 2018
0fd3671
Added information to the ADR about the proposed integration of visual…
markmcdowall May 23, 2018
d3c8404
Removal of erroneous characters in line
markmcdowall May 23, 2018
bdd4756
Initial visualisation script (PCA)
markmcdowall May 23, 2018
8074d6c
Generate PCA pngs
markmcdowall May 23, 2018
7b76cba
Added user defined parameters for the location of the table for the s…
markmcdowall May 23, 2018
49b395d
Added visuals for heatmaps and initial work for executing the scripts
markmcdowall May 24, 2018
1ae33ee
Modifications so that the code is mostly able to run as part of pytest
markmcdowall May 24, 2018
9d886c1
Still have issues with the R scripts, but the python modules are func…
markmcdowall May 24, 2018
e37eb6e
Images now getting generated for attribs
markmcdowall May 25, 2018
a169403
Fix to make sure that the meta data and output files were itemised
markmcdowall May 25, 2018
dc3d403
Merge branch 'master' into sleuth
markmcdowall May 25, 2018
27a92c2
Initial commit of the sleuth Tool JSON
markmcdowall May 25, 2018
92cd4a8
Updated the file paths and arguments
markmcdowall May 25, 2018
019e44c
Merge branch 'sleuth' of https://github.com/Multiscale-Genomics/mg-pr…
markmcdowall May 25, 2018
77b0a8d
Merge branch 'master' into sleuth
markmcdowall May 29, 2018
a33c84a
Fixed formatting error from merge procedure
markmcdowall May 29, 2018
4cddc5b
Fixed formatting error from merge procedure
markmcdowall May 29, 2018
2d8e0cc
Merge branch 'sleuth' of https://github.com/Multiscale-Genomics/mg-pr…
markmcdowall May 29, 2018
b20fbce
Fixed formatting error from merge procedure
markmcdowall May 29, 2018
0feaa2f
Doc fixes for sphinx
markmcdowall May 29, 2018
a7fbac0
Correction to the docs describing the Sleuth pipeline
markmcdowall May 29, 2018
2576dd1
Fix for toolchain test for sleuth
markmcdowall May 29, 2018
ee13687
Changed the mg-tool-api branch as the dummy.py is now on master
markmcdowall May 30, 2018
8b8603d
Merge branch 'master' into sleuth
markmcdowall May 31, 2018
38abbb3
Removed legacy comments
markmcdowall May 31, 2018
4b76330
Merge branch 'sleuth' of https://github.com/Multiscale-Genomics/mg-pr…
markmcdowall May 31, 2018
c5b9559
Moved the RNA-seq and Sleuth to a spearate travis testset so that the…
markmcdowall May 31, 2018
d6263eb
Change for permissions and removal or RNA-seq pipeline tests from ori…
markmcdowall May 31, 2018
40603ec
Updated travis to include the install_r_code script for installing Ka…
markmcdowall May 31, 2018
f1709e4
Modified the permissions for the kallisto installation script
markmcdowall May 31, 2018
012a345
Added FASTQC to the installation for the R code pipelines
markmcdowall May 31, 2018
7a93bff
Removed unused installation procedures
markmcdowall May 31, 2018
4b724d3
Merge branch 'master' into sleuth
markmcdowall Jun 1, 2018
5f99edc
Merge branch 'master' into sleuth
markmcdowall Jun 4, 2018
6877b91
Merge branch 'sleuth' of https://github.com/Multiscale-Genomics/mg-pr…
markmcdowall Jun 4, 2018
b60bd4d
Adding the R directory to the list of directories to cache to reduce …
markmcdowall Jun 4, 2018
55668e1
Fixed typo
markmcdowall Jun 4, 2018
46f349d
Fixes for the installation of tools
markmcdowall Jun 4, 2018
caa1c37
Merge branch 'master' into sleuth
markmcdowall Jun 4, 2018
344d343
Included changes based on feedback from RF
markmcdowall Jun 6, 2018
910480c
Fixed data type syntax error
markmcdowall Jun 7, 2018
feda998
Added newline for consistency
markmcdowall Jun 7, 2018
f91b2ef
Added extra description for the file
markmcdowall Jun 7, 2018
05c4b3c
Merge branch 'master' into sleuth
markmcdowall Jun 22, 2018
1a02d16
Merge branch 'master' into sleuth
markmcdowall Jul 5, 2018
6b28f99
Merge branch 'master' into sleuth
markmcdowall Jul 5, 2018
54fcf21
Added all the metadata from the fq to the kallisto tar results file s…
markmcdowall Jul 5, 2018
16c4cb0
Initial changes to handle multiple files handed over from the VRE
markmcdowall Jul 5, 2018
3af8cf9
Fix for the use of kallisto_config passed as an argument
markmcdowall Jul 6, 2018
92dd3d0
Modified the extraction of the condition data if it is stored inside …
markmcdowall Jul 6, 2018
6de22a0
Minor change to the tool def
markmcdowall Jul 17, 2018
56e4832
Merge branch 'master' into sleuth
markmcdowall Aug 23, 2018
858417a
Merge branch 'sleuth' of https://github.com/Multiscale-Genomics/mg-pr…
markmcdowall Aug 23, 2018
29d2a79
Modified the compression to use the common function
markmcdowall Aug 23, 2018
c489837
Updated the R repo
markmcdowall Sep 7, 2018
7a0382a
Better handling of the image archiving
markmcdowall Sep 7, 2018
145d667
Merge branch 'master' into sleuth
markmcdowall Sep 10, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 15 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ cache:
directories:
- $HOME/.cache/pip
- ${HOME}/lib
- ${HOME}/R

before_cache:
- rm -f $HOME/.cache/pip/log/debug.log
Expand All @@ -33,6 +34,7 @@ env:
matrix:
- TESTENV=docs
- TESTENV=code
- TESTENV=rcode
- TESTENV=wgbs_code_1
- TESTENV=wgbs_code_2
- TESTENV=pylint
Expand Down Expand Up @@ -93,10 +95,12 @@ before_install:
- cd ${HOME}/build/Multiscale-Genomics/mg-process-fastq
- sudo chmod +x scripts/travis/includeMAC2.sh
- sudo chmod +x scripts/travis/includeTADbit.sh
- sudo chmod +x scripts/travis/install_r_code.sh
- sudo chmod +x scripts/travis/install_wgbs_code.sh

- if [[ "$TESTENV" == "code" ]]; then sudo apt-get install r-base-core; fi
- if [[ "$TESTENV" == "code" ]]; then sudo apt-get install python-rpy2; fi
- if [[ "$TESTENV" == "rcode" ]]; then sudo apt-get install r-base-core; fi
- if [[ "$TESTENV" == "rcode" ]]; then sudo apt-get install python-rpy2; fi
- if [[ "$TESTENV" == "rcode" ]]; then sudo ./scripts/travis/install_r_code.sh; fi
- if [[ "$TESTENV" == "code" ]]; then sudo ./scripts/travis/install_code_test_dependencies.sh; fi
- if [[ "$TESTENV" == "code" ]]; then ./scripts/travis/includeMAC2.sh; fi
- if [[ "$TESTENV" == "code" ]]; then ./scripts/travis/includeTADbit.sh; fi
Expand Down Expand Up @@ -204,7 +208,13 @@ before_script:
# - echo "options(repos = c(CRAN = 'http://mirrors.ebi.ac.uk/CRAN/'))" > ${HOME}/.Rprofile
# - echo ".libPaths('~/R')" >> ${HOME}/.Rprofile
# - echo 'message("Using library:", .libPaths()[1])' >> ${HOME}/.Rprofile
# - Rscript scripts/install_packages.R
# - if [[ "$TESTENV" == "code" ]]; then Rscript scripts/install_packages.R; fi

- echo "R_LIB=${HOME}/R" > ${HOME}/.Renviron
- echo "options(repos = c(CRAN = 'http://mirrors.ebi.ac.uk/CRAN/'))" > ${HOME}/.Rprofile
- echo ".libPaths('~/R')" >> ${HOME}/.Rprofile
- echo 'message("Using library:", .libPaths()[1])' >> ${HOME}/.Rprofile
- if [[ "$TESTENV" == "rcode" ]]; then Rscript scripts/install_sleuth.R; fi


- cd ${HOME}/build/Multiscale-Genomics/mg-process-fastq
Expand All @@ -218,6 +228,7 @@ before_script:

- cd ${HOME}/build/Multiscale-Genomics/mg-process-fastq
- chmod +x scripts/travis/harness.sh
- chmod +x scripts/travis/r_harness.sh
- chmod +x scripts/travis/wgbs_harness.sh
- chmod +x scripts/travis/docs_harness.sh
- chmod +x scripts/travis/pylint_harness.sh
Expand All @@ -227,6 +238,7 @@ before_script:
- export PATH="${HOME}/bin:$PATH"

script:
- if [[ "$TESTENV" == "rcode" ]]; then ./scripts/travis/r_harness.sh; fi
- if [[ "$TESTENV" == "code" ]]; then ./scripts/travis/harness.sh; fi
- if [[ "$TESTENV" == "wgbs_code_1" ]]; then ./scripts/travis/wgbs_harness.sh; fi
- if [[ "$TESTENV" == "wgbs_code_2" ]]; then ./scripts/travis/wgbs_harness.sh; fi
Expand Down
10 changes: 9 additions & 1 deletion docs/adr.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,14 @@ Added compression of the split FASTQ files to reduce the amount of space require
The code has been modified so that there is a single decompression of the BWA and Bowtie2 common indexes. The index files are then explicitly handed to the alignment task rather than handing over the compressed index. The decompression is performed as a @task so that the index files are already in the COMPSs system. This means that handing the index files to the alignment tasks creates a single symlink in the sandbox temporary file directory rather than duplicating the whole of the index structure for each job.


2018-05-11 - Sleuth gene differential analysis pipeline
-------------------------------------------------------

This allows for the comparison of multiple RNA-seq experiments to determine if there are any genes that are differentially expressed. This has required changes to the output of the kallisto_quant tool so that it generates only a single tar file containing the abundance and run_info files. There is also the introduction of the bootstrap-sample parameter as part of the quantification to determine the accuracy of the counts.

The first tool uses Sleuth to generate an R object of all the processed tracks. Separate tools are written for each visualisation to allow for a certain amount of parallelisation with the results being saved to an archive file.


2018-05-22 - GEM Naming
-----------------------

Expand All @@ -69,7 +77,7 @@ Update so that the gem files are name <genome-file>.gem.gz inline with requests

To try and improve the quality of the reads that are used for numerous pipelines, TrimGalore has been included as a pipeline to aid in the clipping and removal of low quality regions of reads. The pipeline can be run on single or paired end FASTQ files. A report of the trimmed data is also returned for the user to identify what changes were made.

2018-06-01 - Separated WGBS Vode Testing
2018-06-01 - Separated WGBS Code Testing
----------------------------------------

To bring down the run time for the TravisCI, the WGBS has been moved to a separate track. This has the benefit of getting the testing started earlier and allowing the other tests to finish sooner.
Expand Down
18 changes: 12 additions & 6 deletions docs/full_installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -409,12 +409,18 @@ Install iDEAR
.. code-block:: none
:linenos:

cd ${HOME}/lib
source("https://bioconductor.org/biocLite.R")
biocLite("BSgenome")
biocLite("DESeq2")
if(!require("devtools")) install.packages("devtools")
devtools::install_bitbucket("juanlmateo/idear")
cd ${HOME}/code/mg-process-fastq
Rscript scripts/install_packages.R


Install Sleuth
^^^^^^^^^^^^^^

.. code-block:: none
:linenos:

cd ${HOME}/code/mg-process-fastq
Rscript scripts/install_sleuth.R

Install TADbit
^^^^^^^^^^^^^^
Expand Down
3 changes: 3 additions & 0 deletions docs/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,12 +35,15 @@ Software
- HDF5
- iNPS
- Kallisto
- Sleuth
- libmaus2
- pyenv
- R 2.9.1+
- SAMtools
- MCL
- pigz
- iDEAR


Python Modules
^^^^^^^^^^^^^^
Expand Down
59 changes: 58 additions & 1 deletion docs/pipelines.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1012,7 +1012,7 @@ RNA-Seq Analysis

Example
-------
When running the pipeline on a local machinewithout COMPSs:
When running the pipeline on a local machine without COMPSs:

.. code-block:: none
:linenos:
Expand Down Expand Up @@ -1045,12 +1045,69 @@ RNA-Seq Analysis
:members:


.. automodule:: process_sleuth

This pipeline can process multiple outputs from the process_rnaseq Kallisto
pipeline to identify genes that are differentially expressed between datasets.

Running from the command line
=============================

Parameters
----------
config : str
Configuration JSON file
in_metadata : str
Location of input JSON metadata for files
out_metadata : str
Location of output JSON metadata for files

Returns
-------
R data object : file
Sleuth R object

Example
-------
When running the pipeline on a local machine without COMPSs:

.. code-block:: none
:linenos:

python process_sleuth.py \
--config tests/json/config_sleuth.json \
--in_metadata tests/json/input_sleuth.json \
--out_metadata tests/json/output_sleuth.json \
--local

When using a local version of the [COMPS virtual machine](https://www.bsc.es/research-and-development/software-and-apps/software-list/comp-superscalar/):

.. code-block:: none
:linenos:

runcompss \
--lang=python \
--library_path=${HOME}/bin \
--pythonpath=/<pyenv_virtenv_dir>/lib/python2.7/site-packages/ \
--log_level=debug \
process_sleuth.py \
--config tests/json/config_sleuth.json \
--in_metadata tests/json/input_sleuth.json \
--out_metadata tests/json/output_sleuth.json

Methods
=======
.. autoclass:: process_sleuth.process_sleuth
:members:


TrimGalore
----------
.. automodule:: process_trim_galore

This pipeline can process FASTQ to trim poor base quality or adapter contamination.


Running from the command line
=============================

Expand Down
19 changes: 19 additions & 0 deletions docs/test_data/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ Sample Data
testData_iDamIDSeq
testData_MNaseSeq
testData_RNASeq
testData_Sleuth
testData_WGBS
tests_hic

Expand Down Expand Up @@ -155,6 +156,24 @@ There is a test for each of the tools. This uses the "process" scripts to run ea
-----------
:doc:`testData_RNASeq`

Sleuth
=======
To run the pipeline test:

.. code-block:: none

pytest tests/test_pipeline_sleuth.py


Methods
-------
.. automodule:: tests.test_pipeline_sleuth
:members:

Sample Data
-----------
:doc:`testData_Sleuth`

Whole Genome Bisulfate Sequencing (WGBS)
========================================
To run the pipeline test:
Expand Down
112 changes: 112 additions & 0 deletions docs/test_data/testData_Sleuth.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
.. See the NOTICE file distributed with this work for additional information
regarding copyright ownership.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Sleuth Test Data
================

Test Data
---------

Dataset
^^^^^^^

+------------+--------------------------------------------------------------+
| Stable IDs | ERR030856, ERR030857, ERR030858, ERR030872, ERR030903 |
+------------+--------------------------------------------------------------+
| Project | `PRJEB2445 <https://www.ebi.ac.uk/ena/data/view/PRJEB2445>`_ |
+------------+--------------------------------------------------------------+

Genome
^^^^^^

CDNA was downloaded from `ensembl 92 <http://ftp.ensembl.org/pub/release-92/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz>`_

+-------------+--------+
| Assembly | GRCh38 |
+-------------+--------+
| Transcripts | 1000 |
+-------------+--------+

Method
------
The full dataset was downloaded from ENA aligned to the cDNA using kallisto producing a pseudo alignment bam file. Sleuth was used to calculate the most significant hits of which the top 1000 were picked. These were used to select the matching FASTQ reads from the pseudo alignment files


hiseq_info.txt file:

.. code-block:: none
:linenos:

ERR030856\tcontrol
ERR030857\tcontrol
ERR030858\tcontrol
ERR030872\tthyroid
ERR030903\tthyroid

.. code-block:: R
:linenos:

library("sleuth")
sample_id <- dir(file.path("data", "results"))
kal_dirs <- file.path("data", "results", sample_id, "kallisto")

s2c <- read.table(file.path("data", "hiseq_info.txt"), header = TRUE, stringsAsFactors=FALSE)
s2c <- dplyr::select(s2c, sample, condition)
s2c <- dplyr::mutate(s2c, path = kal_dirs)

so <- sleuth_prep(s2c, extra_bootstrap_summary = TRUE, num_cores = 1)
so <- sleuth_fit(so, ~condition, 'full')
so <- sleuth_fit(so, ~1, 'reduced')
so <- sleuth_lrt(so, 'reduced', 'full')

sleuth_table <- sleuth_results(so, 'reduced:full', 'lrt', show_all = FALSE)
sleuth_significant <- dplyr::filter(sleuth_table, qval <= 0.05)

# Generate a set of transcripts to use for code testing
sample(sleuth_significant$target_id, 1000)


.. code-block:: none
:linenos:

# Kallisto Quantification
kallisto quant -i GRCh38.cdna.fasta.idx -o ERR030856 --pseudobam --single -l 100 -s 0.01 ERR030856/ERR030856.fastq > ERR030856/ERR030856.sam
kallisto quant -i GRCh38.cdna.fasta.idx -o ERR030857 --pseudobam --single -l 100 -s 0.01 ERR030857/ERR030857.fastq > ERR030857/ERR030857.sam
kallisto quant -i GRCh38.cdna.fasta.idx -o ERR030858 --pseudobam --single -l 100 -s 0.01 ERR030858/ERR030858.fastq > ERR030858/ERR030858.sam
kallisto quant -i GRCh38.cdna.fasta.idx -o ERR030903 --pseudobam --single -l 75 -s 0.0133333 ERR030903/ERR030903.fastq > ERR030903/ERR030903.sam


.. code-block:: none
:linenos:

# Extract the FASTQ read IDs for the selected transcripts
grep -f sleuth_sample_transcripts.txt ERR030872/ERR030872.sam | tr "\t" "~" | cut -d"~" -f1 | grep -v @ > ERR030872/ERR030872.reads
grep -f sleuth_sample_transcripts.txt ERR030903/ERR030903.sam | tr "\t" "~" | cut -d"~" -f1 | grep -v @ > ERR030903/ERR030903.reads
grep -f sleuth_sample_transcripts.txt ERR030856/ERR030856.sam | tr "\t" "~" | cut -d"~" -f1 | grep -v @ > ERR030856/ERR030856.reads
grep -f sleuth_sample_transcripts.txt ERR030857/ERR030857.sam | tr "\t" "~" | cut -d"~" -f1 | grep -v @ > ERR030857/ERR030857.reads
grep -f sleuth_sample_transcripts.txt ERR030858/ERR030858.sam | tr "\t" "~" | cut -d"~" -f1 | grep -v @ > ERR030858/ERR030858.reads


.. code-block:: none
:linenos:

# Extract the original reads from teh FASTQ files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the* FASTQ

python scripts/ExtractRowsFromFASTQs.py --input_1 ERR030856/ERR030856.fastq --rows ERR030856/ERR030856.reads --prop 0.1 --output_tag subset
python scripts/ExtractRowsFromFASTQs.py --input_1 ERR030857/ERR030857.fastq --rows ERR030857/ERR030857.reads --prop 0.1 --output_tag subset
python scripts/ExtractRowsFromFASTQs.py --input_1 ERR030858/ERR030858.fastq --rows ERR030858/ERR030858.reads --prop 0.1 --output_tag subset
python scripts/ExtractRowsFromFASTQs.py --input_1 ERR030872/ERR030872_1.fastq --input_2 ERR030872/ERR030872_2.fastq --rows ERR030872/ERR030872.reads --prop 0.1 --output_tag subset
python scripts/ExtractRowsFromFASTQs.py --input_1 ERR030903/ERR030903.fastq --rows ERR030903/ERR030903.reads --prop 0.1 --output_tag subset

Due to the number of reads that match to the transcripts, only 1% have been kept for code testing
9 changes: 9 additions & 0 deletions docs/tools.rst
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,15 @@ Tools for processing FastQ files
:members:


Analysis
========

Sleuth
------
.. autoclass:: tool.sleuth.sleuthTool
:members:


Hi-C Parsing
============

Expand Down
3 changes: 0 additions & 3 deletions process_idear.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,6 @@

from __future__ import print_function

# Required for ReadTheDocs
from functools import wraps # pylint: disable=unused-import

import argparse

from basic_modules.workflow import Workflow
Expand Down
Loading