Skip to content

Commit d903948

Browse files
authored
Merge pull request #32 from cokelaer/main
Major update with freebayes and bwa split
2 parents eea37ec + 2915f45 commit d903948

File tree

11 files changed

+621
-129
lines changed

11 files changed

+621
-129
lines changed

README.rst

Lines changed: 78 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -25,30 +25,31 @@ This is the **variant_calling** pipeline from the `Sequana <https://sequana.read
2525
Installation
2626
~~~~~~~~~~~~
2727

28-
If you already have all requirements, you can install the packages using pip::
28+
You can install sequana_variant_calling pipeline using::
2929

3030
pip install sequana_variant_calling --upgrade
3131

32-
Otherwise, you can create a *sequana_variant_calling* conda environment executing::
32+
I would recommend to setup a *sequana_variant_calling* conda environment executing::
3333

3434
conda env create -f environment.yml
3535

36-
and later activate the environment::
36+
where the environment.yml can be found in the https://github.com/sequana/variant_calling repository.
37+
38+
Later, you can activate the environment as follows::
3739

3840
conda activate sequana_variant_calling
3941

40-
A third option is to install the pipeline with pip method (see above) and use singularity as explained afterwards.
42+
Note, however, that the recommended method is to use singularity/apptainer as explained here below.
4143

4244

4345
Usage
4446
~~~~~
4547

4648
::
4749

48-
sequana_variant_calling --help
49-
sequana_variant_calling --input-directory DATAPATH --reference-file measles.fa
50+
sequana_variant_calling --input-directory DATAPATH --reference-file measles.fa
5051

51-
This creates a directory **variant_calling**. You just need to execute the pipeline::
52+
This creates a directory **variant_calling**. You just need to move into the directory and execute the script::
5253

5354
cd variant_calling
5455
sh variant_calling.sh
@@ -58,20 +59,20 @@ retrieve the pipeline itself and its configuration files and then execute the pi
5859

5960
snakemake -s variant_calling.rules -c config.yaml --cores 4 --stats stats.txt
6061

62+
you can also edit the profile file in .sequana/profile/config.ya,l
63+
6164
Or use `sequanix <https://sequana.readthedocs.io/en/main/sequanix.html>`_ interface.
6265

6366
Usage with singularity::
6467
~~~~~~~~~~~~~~~~~~~~~~~~~
6568

6669
With singularity, initiate the working directory as follows::
6770

68-
sequana_variant_calling --use-singularity
69-
70-
Images are downloaded in the working directory but you can store then in a directory globally (e.g.)::
71-
7271
sequana_variant_calling --use-singularity --singularity-prefix ~/.sequana/apptainers
7372

74-
and then::
73+
Images are downloaded in a global direcory (here .sequana/apptainers) so that you can reuse them later.
74+
75+
and then as before::
7576

7677
cd variant_calling
7778
sh variant_calling.sh
@@ -80,12 +81,11 @@ if you decide to use snakemake manually, do not forget to add singularity option
8081

8182
snakemake -s variant_calling.rules -c config.yaml --cores 4 --stats stats.txt --use-singularity --singularity-prefix ~/.sequana/apptainers --singularity-args "-B /home:/home"
8283

83-
84-
8584
Requirements
8685
~~~~~~~~~~~~
8786

88-
This pipelines requires the following executable(s):
87+
If you rely on singularity/apptainer, no extra dependencies are required (expect python and
88+
https://damona.readthedocs.io). If you cannot use apptainer, you will need to install some software:
8989

9090
- bwa
9191
- freebayes
@@ -95,6 +95,7 @@ This pipelines requires the following executable(s):
9595
- samtools
9696
- snpEff you will need 5.0 or 5.1d (note the d); 5.1 does not work.
9797

98+
9899
.. image:: https://raw.githubusercontent.com/sequana/sequana_variant_calling/main/sequana_pipelines/variant_calling/dag.png
99100

100101
Details
@@ -124,13 +125,75 @@ such as particular codon table will required edition of the snpeff configuration
124125

125126
Finally, joint calling is also available and can be switch on if desired.
126127

128+
Tutorial
129+
~~~~~~~~
130+
131+
Let us download an ecoli reference genome and the data set used to create the assembly. All tools used here below can be
132+
installed with damona (or your favorite environment manager)::
133+
134+
pip install damona
135+
damona create TEST
136+
damona activate TEST
137+
damona install pigz
138+
damona install sratoolkit # for fasterq-dump
139+
damona install datasets
140+
141+
Then, download the data::
142+
143+
fasterq-dump SRR13921546
144+
pigz SRR*fastq
145+
146+
and the reference genome with its annnotation::
147+
148+
datasets download genome accession GCF_000005845.2 --include gff3,rna,cds,protein,genome,seq-report,gbff
149+
unzip ncbi_dataset.zip
150+
ln -s ncbi_dataset/data/GCF_000005845.2/GCF_000005845.2_ASM584v2_genomic.fna ecoli.fa
151+
ln -s ncbi_dataset/data/GCF_000005845.2/genomic.gff ecoli.gff
152+
153+
154+
Initiate the pipeline::
155+
156+
sequana_variant_calling --input-directory . --reference-file ecoli.fa --aligner-choice bwa_split \
157+
--do-coverage --annotation-file ecoli.gff \
158+
--use-apptainer --apptainer-prefix ~/.sequana/apptainers \
159+
--input-readtag "_[12]."
160+
161+
Explication:
162+
163+
- we use apptainer/singularity
164+
- we use the reference genome ecoli.fa (--reference-file) and its annotation for SNPeff (--annotation-file)
165+
- we use the sequana_coverage tool (True by default) to get coverage plots.
166+
- we use --input-directory to indicatre where to find the input files
167+
- This data set is paired. In NGS, it is common to have _R1_ and _R2_ tags to differentiate the 2 files. Here the tag
168+
are _1 and _2. In sequana we define the a wildcard for the read tag. So here we tell the software that thex ecpted tag
169+
follow this pattern: "_[12]." and everything is then automatic.
170+
171+
Then follow the instructions (prepare and execute the pipeline).
172+
173+
You should end up with a summary.hml report.
174+
175+
You can brwose the different samples (only one in this example) and get a table with variant calls:
176+
177+
178+
table.png
179+
180+
If you set the coverage one, (not recommended for eukaryotes), you should see this kind of plots::
181+
182+
coverage.png
183+
184+
185+
127186

128187
Changelog
129188
~~~~~~~~~
130189

131190
========= ======================================================================
132191
Version Description
133192
========= ======================================================================
193+
1.3.0 * Updated version to use latest damona containers and latest
194+
sequana version 0.19.1. added plot in HTML report with distribution
195+
of variants. added tutorial. added bwa_split and freebaye split to
196+
process ultra deep sequencing.
134197
1.2.0 * -Xmx8g option previously added is not robust. Does not work with
135198
snpEff 5.1 for instance.
136199
* add minimap aligner

doc/coverage.png

59.2 KB
Loading

doc/table.png

88.3 KB
Loading

pyproject.toml

Lines changed: 19 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,11 @@
22
requires = ["poetry-core>=1.0.0"]
33
build-backend = "poetry.core.masonry.api"
44

5-
[tool.poetry]
5+
[project]
66
name = "sequana-variant-calling"
7-
version = "1.2.0"
7+
version = "1.3.0"
88
description = "A multi-sample variant calling pipeline"
9-
authors = ["Sequana Team"]
9+
authors = [{name="Sequana Team"}]
1010
license = "BSD-3"
1111
repository = "https://github.com/sequana/variant_calling"
1212
readme = "README.rst"
@@ -19,34 +19,37 @@ classifiers = [
1919
"Intended Audience :: Science/Research",
2020
"License :: OSI Approved :: BSD License",
2121
"Operating System :: POSIX :: Linux",
22-
"Programming Language :: Python :: 3.8",
2322
"Programming Language :: Python :: 3.9",
2423
"Programming Language :: Python :: 3.10",
24+
"Programming Language :: Python :: 3.11",
25+
"Programming Language :: Python :: 3.12",
2526
"Topic :: Software Development :: Libraries :: Python Modules",
2627
"Topic :: Scientific/Engineering :: Bio-Informatics",
2728
"Topic :: Scientific/Engineering :: Information Analysis",
2829
]
2930

31+
32+
requires-python = ">=3.9,<4.0"
33+
dependencies = [
34+
"sequana >=0.19.0",
35+
"sequana_pipetools >=0.16.0",
36+
"click-completion >=0.5.2",
37+
"pytest (>=8.3.4,<9.0.0)"
38+
]
39+
40+
[tool.poetry]
3041
packages = [
3142
{include = "sequana_pipelines"}
3243
]
3344

34-
35-
[tool.poetry.dependencies]
36-
python = ">=3.8,<4.0"
37-
sequana = ">=0.15.0"
38-
sequana_pipetools = ">=0.16.0"
39-
click-completion = "^0.5.2"
40-
41-
42-
[tool.poetry.scripts]
45+
[project.scripts]
4346
sequana_variant_calling = "sequana_pipelines.variant_calling.main:main"
4447

4548

4649
[tool.poetry.group.dev.dependencies]
4750
black = "^23.7.0"
48-
pytest = "^7.4.0"
51+
pytest = ">7.4.0"
4952
mock = "^5.1.0"
50-
pytest-mock = "^3.11.1"
51-
pytest-cov = "^4.1.0"
53+
pytest-mock = ">=3.11.1"
54+
pytest-cov = ">=4.1.0"
5255

0 commit comments

Comments
 (0)