@@ -25,30 +25,31 @@ This is the **variant_calling** pipeline from the `Sequana <https://sequana.read
2525Installation
2626~~~~~~~~~~~~
2727
28- If you already have all requirements, you can install the packages using pip ::
28+ You can install sequana_variant_calling pipeline using::
2929
3030 pip install sequana_variant_calling --upgrade
3131
32- Otherwise, you can create a *sequana_variant_calling * conda environment executing::
32+ I would recommend to setup a *sequana_variant_calling * conda environment executing::
3333
3434 conda env create -f environment.yml
3535
36- and later activate the environment::
36+ where the environment.yml can be found in the https://github.com/sequana/variant_calling repository.
37+
38+ Later, you can activate the environment as follows::
3739
3840 conda activate sequana_variant_calling
3941
40- A third option is to install the pipeline with pip method (see above) and use singularity as explained afterwards .
42+ Note, however, that the recommended method is to use singularity/apptainer as explained here below .
4143
4244
4345Usage
4446~~~~~
4547
4648::
4749
48- sequana_variant_calling --help
49- sequana_variant_calling --input-directory DATAPATH --reference-file measles.fa
50+ sequana_variant_calling --input-directory DATAPATH --reference-file measles.fa
5051
51- This creates a directory **variant_calling **. You just need to execute the pipeline ::
52+ This creates a directory **variant_calling **. You just need to move into the directory and execute the script ::
5253
5354 cd variant_calling
5455 sh variant_calling.sh
@@ -58,20 +59,20 @@ retrieve the pipeline itself and its configuration files and then execute the pi
5859
5960 snakemake -s variant_calling.rules -c config.yaml --cores 4 --stats stats.txt
6061
62+ you can also edit the profile file in .sequana/profile/config.ya,l
63+
6164Or use `sequanix <https://sequana.readthedocs.io/en/main/sequanix.html >`_ interface.
6265
6366Usage with singularity::
6467~~~~~~~~~~~~~~~~~~~~~~~~~
6568
6669With singularity, initiate the working directory as follows::
6770
68- sequana_variant_calling --use-singularity
69-
70- Images are downloaded in the working directory but you can store then in a directory globally (e.g.)::
71-
7271 sequana_variant_calling --use-singularity --singularity-prefix ~/.sequana/apptainers
7372
74- and then::
73+ Images are downloaded in a global direcory (here .sequana/apptainers) so that you can reuse them later.
74+
75+ and then as before::
7576
7677 cd variant_calling
7778 sh variant_calling.sh
@@ -80,12 +81,11 @@ if you decide to use snakemake manually, do not forget to add singularity option
8081
8182 snakemake -s variant_calling.rules -c config.yaml --cores 4 --stats stats.txt --use-singularity --singularity-prefix ~/.sequana/apptainers --singularity-args "-B /home:/home"
8283
83-
84-
8584Requirements
8685~~~~~~~~~~~~
8786
88- This pipelines requires the following executable(s):
87+ If you rely on singularity/apptainer, no extra dependencies are required (expect python and
88+ https://damona.readthedocs.io). If you cannot use apptainer, you will need to install some software:
8989
9090- bwa
9191- freebayes
@@ -95,6 +95,7 @@ This pipelines requires the following executable(s):
9595- samtools
9696- snpEff you will need 5.0 or 5.1d (note the d); 5.1 does not work.
9797
98+
9899.. image :: https://raw.githubusercontent.com/sequana/sequana_variant_calling/main/sequana_pipelines/variant_calling/dag.png
99100
100101Details
@@ -124,13 +125,75 @@ such as particular codon table will required edition of the snpeff configuration
124125
125126Finally, joint calling is also available and can be switch on if desired.
126127
128+ Tutorial
129+ ~~~~~~~~
130+
131+ Let us download an ecoli reference genome and the data set used to create the assembly. All tools used here below can be
132+ installed with damona (or your favorite environment manager)::
133+
134+ pip install damona
135+ damona create TEST
136+ damona activate TEST
137+ damona install pigz
138+ damona install sratoolkit # for fasterq-dump
139+ damona install datasets
140+
141+ Then, download the data::
142+
143+ fasterq-dump SRR13921546
144+ pigz SRR*fastq
145+
146+ and the reference genome with its annnotation::
147+
148+ datasets download genome accession GCF_000005845.2 --include gff3,rna,cds,protein,genome,seq-report,gbff
149+ unzip ncbi_dataset.zip
150+ ln -s ncbi_dataset/data/GCF_000005845.2/GCF_000005845.2_ASM584v2_genomic.fna ecoli.fa
151+ ln -s ncbi_dataset/data/GCF_000005845.2/genomic.gff ecoli.gff
152+
153+
154+ Initiate the pipeline::
155+
156+ sequana_variant_calling --input-directory . --reference-file ecoli.fa --aligner-choice bwa_split \
157+ --do-coverage --annotation-file ecoli.gff \
158+ --use-apptainer --apptainer-prefix ~/.sequana/apptainers \
159+ --input-readtag "_[12]."
160+
161+ Explication:
162+
163+ - we use apptainer/singularity
164+ - we use the reference genome ecoli.fa (--reference-file) and its annotation for SNPeff (--annotation-file)
165+ - we use the sequana_coverage tool (True by default) to get coverage plots.
166+ - we use --input-directory to indicatre where to find the input files
167+ - This data set is paired. In NGS, it is common to have _R1_ and _R2_ tags to differentiate the 2 files. Here the tag
168+ are _1 and _2. In sequana we define the a wildcard for the read tag. So here we tell the software that thex ecpted tag
169+ follow this pattern: "_[12]." and everything is then automatic.
170+
171+ Then follow the instructions (prepare and execute the pipeline).
172+
173+ You should end up with a summary.hml report.
174+
175+ You can brwose the different samples (only one in this example) and get a table with variant calls:
176+
177+
178+ table.png
179+
180+ If you set the coverage one, (not recommended for eukaryotes), you should see this kind of plots::
181+
182+ coverage.png
183+
184+
185+
127186
128187Changelog
129188~~~~~~~~~
130189
131190========= ======================================================================
132191Version Description
133192========= ======================================================================
193+ 1.3.0 * Updated version to use latest damona containers and latest
194+ sequana version 0.19.1. added plot in HTML report with distribution
195+ of variants. added tutorial. added bwa_split and freebaye split to
196+ process ultra deep sequencing.
1341971.2.0 * -Xmx8g option previously added is not robust. Does not work with
135198 snpEff 5.1 for instance.
136199 * add minimap aligner
0 commit comments