Add short-read QC & trimming workflow #976

bebatut · 2025-10-07T11:11:04Z

FOR CONTRIBUTOR:

I have read the Adding workflows guidelines
License permits unrestricted use (educational + commercial)
Please also take note of the reviewer guidelines below to facilitate a smooth review process.

FOR REVIEWERS:

.dockstore.yml: file is present and aligned with creator metadata in workflow. ORCID identifiers are strongly encouraged in creator metadata. The .dockstore.yml file is required to run tests
Workflow is sufficiently generic to be used with lab data and does not hardcode sample names, reference data and can be run without reading an accompanying tutorial.
In workflow: annotation field contains short description of what the workflow does. Should start with This workflow does/runs/performs … xyz … to generate/analyze/etc …
In workflow: workflow inputs and outputs have human readable names (spaces are fine, no underscore, dash only where spelling dictates it), no abbreviation unless it is generally understood. Altering input or output labels requires adjusting these labels in the the workflow-tests.yml file as well
In workflow: name field should be human readable (spaces are fine, no underscore, dash only where spelling dictates it), no abbreviation unless generally understood
Workflow folder: prefer dash (-) over underscore (_), prefer all lowercase. Folder becomes repository in iwc-workflows organization and is included in TRS id
Readme explains what workflow does, what are valid inputs and what outputs users can expect. If a tutorial or other resources exist they can be linked. If a similar workflow exists in IWC readme should explain differences with existing workflow and when one might prefer one workflow over another
Changelog contains appropriate entries
Large files (> 100 KB) are uploaded to zenodo and location urls are used in test file

wm75 · 2025-10-08T11:09:58Z

...flows/read_preprocessing/short-reads-qc-trimming/short-reads-quality-control-and-trimming.ga

+            "label": "Cutting mean quality",
+            "name": "Input parameter",
+            "outputs": [],
+            "position": {
+                "left": 0,
+                "top": 680
+            },


Does that make sense at all? I think without enabling either 5' or 3' end trimming, this won't do anything?

Good point. @paulzierep wdyt?

quick look up --cut_by_quality3 seems to be the most common case for illumina, maybe we should have that as default ?

github-actions · 2025-10-08T13:50:59Z

Test Results (powered by Planemo)

Test Summary

Test State	Count
Total	0
Passed	0
Error	0
Failure	0
Skipped	0

github-actions · 2025-10-08T14:56:57Z

Test Results (powered by Planemo)

Test Summary

Test State	Count
Total	1
Passed	1
Error	0
Failure	0
Skipped	0

Passed Tests

✅ short-reads-quality-control-and-trimming.ga_0

Workflow invocation details

Invocation Messages

Steps

Step 1: Raw reads:
- step_state: scheduled
Step 2: Adapter to remove on forward reads:
- step_state: scheduled
Step 3: Adapter to remove on reverse reads:
- step_state: scheduled
Step 4: Qualified quality score:
- step_state: scheduled
Step 5: Minimal read length:
- step_state: scheduled

Step 6: fastp:

step_state: scheduled

Jobs

Job 1:

Job state is ok

Command Line:

ln -sf '/tmp/tmpsdyv8z95/files/4/d/0/dataset_4d062e66-b884-4661-a630-5f79d9751d98.dat' 'pair.fastqsanger.gz' && ln -sf '/tmp/tmpsdyv8z95/files/5/d/d/dataset_5ddfe19f-6225-44be-850c-83976c60cb05.dat' 'pair_R2.fastqsanger.gz' &&   fastp  --thread ${GALAXY_SLOTS:-1} --report_title 'fastp report for pair.fastqsanger.gz'  -i 'pair.fastqsanger.gz'   -I 'pair_R2.fastqsanger.gz' -o first.fastqsanger.gz -O second.fastqsanger.gz                       -q 15      -l 15                       && mv first.fastqsanger.gz '/tmp/tmpsdyv8z95/job_working_directory/000/3/outputs/dataset_0d56bb07-e767-40a2-ac0c-518a758a51c5.dat' && mv second.fastqsanger.gz '/tmp/tmpsdyv8z95/job_working_directory/000/3/outputs/dataset_e74ce4f4-69db-48b5-81b3-a67027661de6.dat'

Exit Code:

```
0
```

Standard Error:

Read1 before filtering:
total reads: 150000
total bases: 30200104
Q20 bases: 29363746(97.2306%)
Q30 bases: 27736272(91.8416%)
Q40 bases: 0(0%)

Read2 before filtering:
total reads: 150000
total bases: 30326007
Q20 bases: 28396209(93.6365%)
Q30 bases: 25963530(85.6147%)
Q40 bases: 0(0%)

Read1 after filtering:
total reads: 147840
total bases: 29669900
Q20 bases: 28939702(97.5389%)
Q30 bases: 27398791(92.3454%)
Q40 bases: 0(0%)

Read2 after filtering:
total reads: 147840
total bases: 29561048
Q20 bases: 28110619(95.0934%)
Q30 bases: 25827422(87.3698%)
Q40 bases: 0(0%)

Filtering result:
reads passed filter: 295680
reads failed due to low quality: 4320
reads failed due to too many N: 0
reads failed due to too short: 0
reads with adapter trimmed: 6496
bases trimmed due to adapters: 220832

Duplication rate: 0.291333%

Insert size peak (evaluated by paired-end reads): 147

JSON report: fastp.json
HTML report: fastp.html

fastp --thread 1 --report_title fastp report for pair.fastqsanger.gz -i pair.fastqsanger.gz -I pair_R2.fastqsanger.gz -o first.fastqsanger.gz -O second.fastqsanger.gz -q 15 -l 15 
fastp v1.0.1, time used: 5 seconds

Traceback:

Job Parameters:

Job parameter	Parameter value
__input_ext	`"input"`
__workflow_invocation_uuid__	`"b51d6ae4a45611f0a35e7c1e52448ad2"`
chromInfo	`"/tmp/tmpsdyv8z95/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"`
dbkey	`"?"`
filter_options	`{"length_filtering_options": {"disable_length_filtering": false, "length_limit": null, "length_required": "15"}, "low_complexity_filter": {"complexity_threshold": null, "enable_low_complexity_filter": false}, "quality_filtering_options": {"disable_quality_filtering": false, "n_base_limit": null, "qualified_quality_phred": "15", "unqualified_percent_limit": null}}`
output_options	`{"report_html": true, "report_json": true}`
overrepresented_sequence_analysis	`{"overrepresentation_analysis": false, "overrepresentation_sampling": null}`
read_mod_options	{"base_correction_options": {"correction": false}, "cutting_by_quality_options": {"cut_front_select": {"__current_case__": 1, "cut_front": ""}, "cut_right_select": {"__current_case__": 1, "cut_right": ""}, "cut_tail_select": {"__current_case__": 1, "cut_tail": ""}}, "polyg_tail_trimming": {"__current_case__": 1, "poly_g_min_len": null, "trimming_select": ""}, "polyx_tail_trimming": {"__current_case__": 1, "polyx_trimming_select": ""}, "umi_processing": {"umi": false, "umi_len": null, "umi_loc": null, "umi_prefix": null}}
single_paired	`{"__current_case__": 1, "adapter_trimming_options": {"adapter_sequence1": null, "adapter_sequence2": null, "detect_adapter_for_pe": false, "disable_adapter_trimming": false}, "global_trimming_options": {"trim_front1": null, "trim_front2": null, "trim_tail1": null, "trim_tail2": null}, "merge_reads": {"__current_case__": 1, "merge": ""}, "paired_input": {"values": [{"id": 1, "src": "dce"}]}, "single_paired_selector": "paired_collection"}`

Step 7: MultiQC:

step_state: scheduled

Jobs

Job 1:

Job state is ok

Command Line:

die() { echo "$@" 1>&2 ; exit 1; } &&  mkdir multiqc_WDir &&   mkdir multiqc_WDir/fastp_0 &&     ln -s '/tmp/tmpsdyv8z95/files/7/c/2/dataset_7c2bf871-8c25-42ee-83a6-f0dd748fb1f4.dat' 'multiqc_WDir/fastp_0/pairfastp.json' && grep -q "report_title" 'multiqc_WDir/fastp_0/pairfastp.json' || die "'report_title' or 'report_title' not found in the file" &&   multiqc multiqc_WDir --filename 'report'       && mkdir -p ./plots && ls -l ./report_data/ && cp ./report_data/*plot*.txt ./plots/ | true

Exit Code:

```
0
```

Standard Error:

/// MultiQC 🔍 v1.24.1

     version_check | MultiQC Version v1.31 now available!
       file_search | Search path: /tmp/tmpsdyv8z95/job_working_directory/000/4/working/multiqc_WDir

             fastp | Found 1 reports

     write_results | Data        : report_data
     write_results | Report      : report.html
           multiqc | MultiQC complete

Standard Output:

total 1672
-rw-r--r-- 1 1001 1001    8719 Oct  8 14:55 fastp-insert-size-plot.txt
-rw-r--r-- 1 1001 1001    6069 Oct  8 14:55 fastp-seq-content-gc-plot_Read_1_After_filtering.txt
-rw-r--r-- 1 1001 1001    6106 Oct  8 14:55 fastp-seq-content-gc-plot_Read_1_Before_filtering.txt
-rw-r--r-- 1 1001 1001    6092 Oct  8 14:55 fastp-seq-content-gc-plot_Read_2_After_filtering.txt
-rw-r--r-- 1 1001 1001    6082 Oct  8 14:55 fastp-seq-content-gc-plot_Read_2_Before_filtering.txt
-rw-r--r-- 1 1001 1001    4309 Oct  8 14:55 fastp-seq-content-n-plot_Read_1_After_filtering.txt
-rw-r--r-- 1 1001 1001    4484 Oct  8 14:55 fastp-seq-content-n-plot_Read_1_Before_filtering.txt
-rw-r--r-- 1 1001 1001    4309 Oct  8 14:55 fastp-seq-content-n-plot_Read_2_After_filtering.txt
-rw-r--r-- 1 1001 1001    4379 Oct  8 14:55 fastp-seq-content-n-plot_Read_2_Before_filtering.txt
-rw-r--r-- 1 1001 1001    5484 Oct  8 14:55 fastp-seq-quality-plot_Read_1_After_filtering.txt
-rw-r--r-- 1 1001 1001    5475 Oct  8 14:55 fastp-seq-quality-plot_Read_1_Before_filtering.txt
-rw-r--r-- 1 1001 1001    5483 Oct  8 14:55 fastp-seq-quality-plot_Read_2_After_filtering.txt
-rw-r--r-- 1 1001 1001    5472 Oct  8 14:55 fastp-seq-quality-plot_Read_2_Before_filtering.txt
-rw-r--r-- 1 1001 1001      50 Oct  8 14:55 fastp_filtered_reads_plot.txt
-rw-r--r-- 1 1001 1001     121 Oct  8 14:55 multiqc_citations.txt
-rw-r--r-- 1 1001 1001 1391307 Oct  8 14:55 multiqc_data.json
-rw-r--r-- 1 1001 1001  187214 Oct  8 14:55 multiqc_fastp.txt
-rw-r--r-- 1 1001 1001     460 Oct  8 14:55 multiqc_general_stats.txt
-rw-r--r-- 1 1001 1001      25 Oct  8 14:55 multiqc_software_versions.txt
-rw-r--r-- 1 1001 1001     147 Oct  8 14:55 multiqc_sources.txt

Traceback:

Job Parameters:

Job parameter	Parameter value
__input_ext	`"input"`
__workflow_invocation_uuid__	`"b51d6ae4a45611f0a35e7c1e52448ad2"`
chromInfo	`"/tmp/tmpsdyv8z95/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"`
comment	`""`
dbkey	`"?"`
export	`false`
flat	`false`
results	`[{"__index__": 0, "software_cond": {"__current_case__": 7, "input": {"values": [{"id": 4, "src": "hdca"}]}, "software": "fastp"}}]`
title	`""`

Other invocation details
- history_id
  - 3d4901aa038e9284
- history_state
  - ok
- invocation_id
  - 3d4901aa038e9284
- invocation_state
  - scheduled
- workflow_id
  - 3d4901aa038e9284

Add QC & trimming workflow

5909724

bebatut force-pushed the qc-trimming branch from 433486a to 5909724 Compare October 7, 2025 11:37

wm75 reviewed Oct 8, 2025

View reviewed changes

bebatut changed the title ~~Add raw read QC & trimming workflow~~ Add short-read QC & trimming workflow Oct 8, 2025

bebatut force-pushed the qc-trimming branch from e3df4f2 to 50baf3b Compare October 8, 2025 13:42

Rename workflow to add short-reads

9bf78cf

bebatut force-pushed the qc-trimming branch from 50baf3b to 9bf78cf Compare October 8, 2025 13:45

Update to latest fastp and remove Cutting mean quality parameter

fba6e37

bebatut force-pushed the qc-trimming branch from 783f1c3 to fba6e37 Compare October 8, 2025 15:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add short-read QC & trimming workflow #976

Add short-read QC & trimming workflow #976

Uh oh!

bebatut commented Oct 7, 2025

Uh oh!

wm75 Oct 8, 2025

Uh oh!

bebatut Oct 8, 2025

Uh oh!

paulzierep Oct 8, 2025

Uh oh!

github-actions bot commented Oct 8, 2025

Uh oh!

github-actions bot commented Oct 8, 2025

Workflow invocation details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add short-read QC & trimming workflow #976

Are you sure you want to change the base?

Add short-read QC & trimming workflow #976

Uh oh!

Conversation

bebatut commented Oct 7, 2025

Uh oh!

wm75 Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

bebatut Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

paulzierep Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 8, 2025

Test Results (powered by Planemo)

Test Summary

Uh oh!

github-actions bot commented Oct 8, 2025

Test Results (powered by Planemo)

Test Summary

Workflow invocation details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants