Skip to content

Commit 872cfc7

Browse files
authored
Merge pull request #34 from cokelaer/main
Fix genomecov and support long reads
2 parents 6835312 + 990c95e commit 872cfc7

File tree

13 files changed

+265
-151
lines changed

13 files changed

+265
-151
lines changed

.github/workflows/apptainer.yml

Lines changed: 60 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -6,53 +6,93 @@ on:
66
- main
77
- dev
88
pull_request:
9-
branches-ignore: []
9+
workflow_dispatch:
1010
schedule:
11-
- cron: '0 0 2 * *'
11+
- cron: '0 0 20 * *'
1212

1313
jobs:
1414
build-linux:
1515
runs-on: ubuntu-latest
16+
1617
strategy:
17-
max-parallel: 5
1818
matrix:
19-
python: [3.8, '3.10']
19+
python: ['3.10', '3.11']
2020
fail-fast: false
21-
21+
max-parallel: 5
2222

2323
steps:
2424

25-
- name: precleanup
25+
# Clean up unnecessary preinstalled packages to free disk space
26+
- name: Pre-cleanup
2627
run: |
2728
sudo rm -rf /usr/share/dotnet
2829
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
29-
- name: install graphviz
30+
31+
# Cache APT .deb packages
32+
- name: Cache APT archives
33+
uses: actions/cache@v3
34+
with:
35+
path: /var/cache/apt/archives
36+
key: ${{ runner.os }}-apt-cache-v1
37+
38+
# Cache Apptainer installation
39+
- name: Cache Apptainer install
40+
id: cache-apptainer
41+
uses: actions/cache@v3
42+
with:
43+
path: |
44+
/usr/bin/apptainer
45+
/usr/lib/apptainer
46+
/etc/apptainer
47+
key: ${{ runner.os }}-apptainer-v1
48+
49+
# Install Apptainer only if not cached
50+
- name: Install Apptainer
51+
if: steps.cache-apptainer.outputs.cache-hit != 'true'
3052
run: |
31-
sudo apt update
32-
sudo apt-get install -y graphviz software-properties-common
53+
sudo apt-get update
54+
sudo apt-get install -y software-properties-common
3355
sudo add-apt-repository -y ppa:apptainer/ppa
34-
sudo apt update
35-
sudo apt install -y apptainer
56+
sudo apt-get update
57+
sudo apt-get install -y apptainer
58+
59+
# Cache Apptainer image cache (~/.apptainer/cache)
60+
- name: Cache Apptainer images
61+
uses: actions/cache@v3
62+
with:
63+
path: ~/.apptainer/cache
64+
key: ${{ runner.os }}-apptainer-images-v1
3665

37-
- name: checkout git repo
38-
uses: actions/checkout@v3
66+
# Checkout repository
67+
- name: Checkout repo
68+
uses: actions/checkout@v4
3969

40-
- name: Set up Python 3.X
41-
uses: actions/setup-python@v3
70+
# 🐍 Set up Python
71+
- name: Set up Python ${{ matrix.python }}
72+
uses: actions/setup-python@v5
4273
with:
4374
python-version: ${{ matrix.python }}
4475

76+
# Install dependencies
4577
- name: Install dependencies
4678
run: |
79+
python -m pip install --upgrade pip
4780
pip install .[testing]
4881
49-
- name: install package itself
82+
# Install package and pinned dependency (example: pulp)
83+
- name: Install package itself
5084
run: |
51-
pip install .
52-
pip install "pulp==2.7.0" --no-deps
85+
pip install .
86+
pip install "pulp==2.7.0" --no-deps
5387
54-
- name: testing
88+
# Run tests using Apptainer
89+
- name: Run Apptainer tests
5590
run: |
56-
sequana_variant_calling --input-directory test/data/ --use-apptainer --annotation-file test/data/JB409847.gbk --reference-file test/data/JB409847.fasta && cd variant_calling && sh variant_calling.sh
91+
sequana_variant_calling \
92+
--input-directory test/data/ \
93+
--apptainer-prefix ~/.apptainer/cache \
94+
--annotation-file test/data/JB409847.gbk \
95+
--reference-file test/data/JB409847.fasta
5796
97+
cd variant_calling && bash variant_calling.sh
5898

.github/workflows/main.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ jobs:
1717
strategy:
1818
max-parallel: 5
1919
matrix:
20-
python: [ 3.9, '3.10', '3.11']
20+
python: [ '3.10', '3.11']
2121
fail-fast: false
2222

2323

.github/workflows/pypi.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,14 +11,14 @@ jobs:
1111
runs-on: ubuntu-20.04
1212
steps:
1313
- uses: actions/checkout@main
14-
- name: Set up Python 3.8
14+
- name: Set up Python 3.9
1515
uses: actions/setup-python@v2
1616
with:
17-
python-version: 3.8
17+
python-version: 3.9
1818

19-
- name: Install package
19+
- name: Install package
2020
run: |
21-
pip install build poetry
21+
pip install build "poetry>=2"
2222
2323
- name: Build source tarball
2424
run: |

README.rst

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,9 @@
99
.. image:: https://github.com/sequana/variant_calling/actions/workflows/main.yml/badge.svg
1010
:target: https://github.com/sequana/variant_calling/actions
1111

12-
.. image:: https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C3.10-blue.svg
12+
.. image:: https://img.shields.io/badge/python-3.10%20%7C%203.11%20%7C3.12-blue.svg
1313
:target: https://pypi.python.org/pypi/sequana
14-
:alt: Python 3.8 | 3.9 | 3.10
14+
:alt: Python 3.10 | 3.11 | 3.12
1515

1616
This is the **variant_calling** pipeline from the `Sequana <https://sequana.readthedocs.org>`_ projet
1717

@@ -152,7 +152,7 @@ and the reference genome with its annnotation::
152152

153153

154154
Initiate the pipeline::
155-
155+
156156
sequana_variant_calling --input-directory . --reference-file ecoli.fa --aligner-choice bwa_split \
157157
--do-coverage --annotation-file ecoli.gff \
158158
--use-apptainer --apptainer-prefix ~/.sequana/apptainers \
@@ -164,9 +164,7 @@ Explication:
164164
- we use the reference genome ecoli.fa (--reference-file) and its annotation for SNPeff (--annotation-file)
165165
- we use the sequana_coverage tool (True by default) to get coverage plots.
166166
- we use --input-directory to indicatre where to find the input files
167-
- This data set is paired. In NGS, it is common to have _R1_ and _R2_ tags to differentiate the 2 files. Here the tag
168-
are _1 and _2. In sequana we define the a wildcard for the read tag. So here we tell the software that thex ecpted tag
169-
follow this pattern: "_[12]." and everything is then automatic.
167+
- This data set is paired. In NGS, it is common to have _R1_ and _R2_ tags to differentiate the 2 files. Here the tags are `_1` and `_2`. In sequana we define the a wildcard for the read tag. So here we tell the software that thex expected tags follow this pattern: "_[12]." and everything is then automatic.
170168

171169
Then follow the instructions (prepare and execute the pipeline).
172170

@@ -175,11 +173,11 @@ You should end up with a summary.hml report.
175173

176174
You can browse the different samples (only one in this example) and get a table with variant calls:
177175

178-
https://raw.githubusercontent.com/sequana/variant_calling/refs/heads/main/doc/table.png
176+
.. image:: https://raw.githubusercontent.com/sequana/variant_calling/refs/heads/main/doc/table.png
179177

180178
If you set the coverage one, (not recommended for eukaryotes), you should see this kind of plots:
181179

182-
https://raw.githubusercontent.com/sequana/variant_calling/refs/heads/main/doc/coverage.png
180+
.. image:: https://raw.githubusercontent.com/sequana/variant_calling/refs/heads/main/doc/coverage.png
183181

184182

185183

@@ -191,6 +189,12 @@ Changelog
191189
========= ======================================================================
192190
Version Description
193191
========= ======================================================================
192+
1.4.0 * handles long reads data. Use sequana html_report to create the VCF
193+
html reports instead of wrapper. More dynamic. Updated some
194+
containers, in particular for sequana_coverage.
195+
* Fixed regression in bwa mapping
196+
* Fixed ordering of contigs on genomecov that was not sorted in the
197+
same way as samtools in some cases.
194198
1.3.0 * Updated version to use latest damona containers and latest
195199
sequana version 0.19.1. added plot in HTML report with distribution
196200
of variants. added tutorial. added bwa_split and freebaye split to

environment.yml

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,16 +6,19 @@ channels:
66
- defaults
77

88
dependencies:
9-
- freebayes>1,<1.3
9+
- freebayes>1.3
1010
- bwa
11+
- bcftools
1112
- 'snpeff==5.1d'
1213
- sambamba
14+
- fastp
15+
- fastqc
1316
- picard>2.26
1417
- samtools>=1.15
1518
- bamtools
1619
- minimap2
1720
- pip
1821
- pip:
19-
- sequana
22+
- "sequana>=0.19.4"
2023

2124

pyproject.toml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api"
44

55
[project]
66
name = "sequana-variant-calling"
7-
version = "1.3.0"
7+
version = "1.4.0"
88
description = "A multi-sample variant calling pipeline"
99
authors = [{name="Sequana Team"}]
1010
license = "BSD-3"
@@ -31,8 +31,8 @@ classifiers = [
3131

3232
requires-python = ">=3.9,<4.0"
3333
dependencies = [
34-
"sequana >=0.19.0",
35-
"sequana_pipetools >=0.16.0",
34+
"sequana >=0.19.4",
35+
"sequana_pipetools >=1.3.0",
3636
"click-completion >=0.5.2",
3737
"pytest (>=8.3.4,<9.0.0)"
3838
]

sequana_pipelines/variant_calling/config.yaml

Lines changed: 20 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -32,15 +32,15 @@ general:
3232

3333
apptainers:
3434
#bwa: https://zenodo.org/record/7970243/files/bwa_0.7.17.img
35-
bwa: "https://zenodo.org/record/14945560/files/sequana_tools_0.19.1.img"
35+
bwa: https://zenodo.org/record/17535070/files/sequana_tools_0.19.3.img
3636
samtools: https://zenodo.org/record/7437898/files/samtools_1.16.1.img
3737
seqkit: https://zenodo.org/record/7821924/files/seqkit_2.4.0.img
38-
sequana_coverage: https://zenodo.org/record/14945560/files/sequana_tools_0.19.1.img
39-
sequana_tools: "https://zenodo.org/record/14945560/files/sequana_tools_0.19.1.img"
40-
graphviz: "https://zenodo.org/record/7928262/files/graphviz_7.0.5.img"
41-
minimap2: "https://zenodo.org/record/5799482/files/minimap2_2.23.0.img"
42-
multiqc: "https://zenodo.org/record/10205070/files/multiqc_1.16.0.img"
43-
freebayes: "https://zenodo.org/record/14930911/files/freebayes_1.3.9.img"
38+
sequana_coverage: https://zenodo.org/record/17535070/files/sequana_tools_0.19.3.img
39+
sequana_tools: https://zenodo.org/record/17535070/files/sequana_tools_0.19.3.img
40+
graphviz: https://zenodo.org/record/7928262/files/graphviz_7.0.5.img
41+
minimap2: https://zenodo.org/record/17535070/files/sequana_tools_0.19.3.img
42+
multiqc: https://zenodo.org/record/17100751/files/multiqc_1.27.0-zenodo1.img
43+
freebayes: https://zenodo.org/record/14930911/files/freebayes_1.3.9.img
4444
fastqc: https://zenodo.org/record/7015004/files/fastqc_0.11.9-py3.img
4545
fastp: https://zenodo.org/record/7319782/files/fastp_0.23.2.img
4646

@@ -107,7 +107,7 @@ bwa_index:
107107

108108

109109
bwa_split:
110-
nreads: 100000
110+
nreads: 1000000
111111
index_algorithm: is
112112
options: -T 30 -M
113113
threads: 4
@@ -151,11 +151,13 @@ snpeff:
151151
# :Parameters:
152152
#
153153
# - ploidy: set the ploidy of your samples.
154-
# - options: any options recognised by freebayes.
155-
#
154+
# - options: any options recognised by freebayes. One useful options is
155+
# --min-alternate-fraction to decreasy minimal frequency to e.g. 1%
156+
# since default if 5%
157+
#
156158
freebayes:
157159
ploidy: 1
158-
chunksize: 100000
160+
chunksize: 1000000
159161
options: --legacy-gls
160162
resources:
161163
mem: 8G
@@ -187,6 +189,8 @@ sambamba_markdup:
187189
remove_duplicates: false
188190
tmp_directory: ./tmp/
189191
options:
192+
resources:
193+
mem: 8G
190194

191195
##############################################################################
192196
# Filter reads with a mapping score lower than an integer
@@ -200,6 +204,8 @@ sambamba_filter:
200204
do: true
201205
threshold: 30
202206
options:
207+
resources:
208+
mem: 8G
203209

204210
##############################################################################
205211
# Sequana coverage - Analyse the coverage of the mapping
@@ -287,8 +293,8 @@ joint_freebayes_vcf_filter:
287293
# or -n 5 (minimum number of Ns required to discard a read)
288294
fastp:
289295
do: true
290-
options: ' --cut_tail '
291-
minimum_length: 20
296+
options: '--cut_tail'
297+
min_length_required: 20
292298
adapters: ''
293299
quality: 15
294300
threads: 4
@@ -305,6 +311,7 @@ fastp:
305311
# - options: string with any valid FastQC options
306312
#
307313
fastqc:
314+
do: true
308315
options: --nogroup
309316
threads: 4
310317
resources:

sequana_pipelines/variant_calling/main.py

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -69,14 +69,6 @@
6969
default=None,
7070
help="The annotation for snpeff. This is optional but highly recommended to obtain meaningful HTML report.",
7171
)
72-
@click.option(
73-
"--do-coverage",
74-
"do_coverage",
75-
is_flag=True,
76-
default=False,
77-
show_default=True,
78-
help="perform the coverage analysis using sequana_coverage.",
79-
)
8072
@click.option(
8173
"--nanopore",
8274
is_flag=True,
@@ -152,9 +144,13 @@ def fill_reference_file():
152144
if options["nanopore"]:
153145
cfg.general.aligner_choice = "minimap2"
154146
cfg.minimap2.options = "-x map-ont"
147+
cfg.input_readtag = ""
148+
cfg.fastqc.do = False
155149
elif options["pacbio"]:
156150
cfg.general.aligner_choice = "minimap2"
157151
cfg.minimap2.options = "-x map-pb"
152+
cfg.input_readtag = ""
153+
cfg.fastqc.do = False
158154
else:
159155
cfg.general.aligner_choice = options.aligner
160156

0 commit comments

Comments
 (0)