Skip to content

Fix --save_output_as_bam flag#2154

Open
FriederikeHanssen wants to merge 11 commits intodevfrom
save-output-bam
Open

Fix --save_output_as_bam flag#2154
FriederikeHanssen wants to merge 11 commits intodevfrom
save-output-bam

Conversation

@FriederikeHanssen
Copy link
Copy Markdown
Contributor

@FriederikeHanssen FriederikeHanssen commented Mar 9, 2026

Summary

Fixes the --save_output_as_bam flag which was broken in multiple ways. The core approach: instead of always producing CRAM and converting back to BAM, make processes output the correct format directly.

APPLYBQSR (recalibrate stage)

  • Unify BAM/CRAM branches in bam_applybqsr — both run unconditionally (one is always empty), mixed into single alignment emit
  • Remove CRAM_TO_BAM_RECAL conversion step (no longer needed — APPLYBQSR outputs BAM directly when flag is set via ext.suffix)
  • Add BAM merge publishDir config for recalibrate stage

Markduplicates stage

  • Make GATK4_MARKDUPLICATES / GATK4SPARK_MARKDUPLICATES ext.prefix conditional on save_output_as_bam — produces .md.bam or .md.cram directly
  • Emit unified alignment channel from bam_markduplicates subworkflows (same pattern as applybqsr)
  • Remove CRAM_TO_BAM conversion step at markduplicates stage entirely
  • Fix BAM_TO_CRAM_MAPPING ext.when to skip conversion when save_output_as_bam is set with skip_markduplicates
  • Wrap markdup CRAM_TO_BAM in explicit if block instead of deprecated ext.when

CSV restart files

  • Fix CSV create subworkflows to derive file type from actual filenames (file.name/index.name) instead of fragile .minus(".cram") hack that broke with BAM inputs
  • Remove now-unused save_output_as_bam parameter from CHANNEL_MARKDUPLICATES_CREATE_CSV and CHANNEL_BASERECALIBRATOR_CREATE_CSV

Cleanup

  • Remove dead BAM_TO_CRAM and CRAM_TO_BAM config blocks
  • Remove stale CRAM_TO_BAM_RECAL reference from docs
  • Add test scenario exercising full variant calling path with --save_output_as_bam

Known limitation

  • BAM_SENTIEON_DEDUP still always outputs CRAM — --save_output_as_bam does not yet produce BAMs for the sentieon dedup path (no crash, just no BAM output)

Closes #2136, #2064, #2149, #2148

Test plan

  • nf-test test tests/save_output_as_bam.nf.test --profile debug,test,docker — both scenarios pass
  • nf-test test tests/default.nf.test --profile debug,test,docker — default test passes (no BAM artifacts without flag)
  • --save_output_as_bam: BAM files in preprocessing/markduplicates/ and preprocessing/recalibrated/, variant calling runs
  • Without flag: only CRAM files, no BAM conversion artifacts
  • With intervals (WGS): no duplicate emission errors

🤖 Generated with Claude Code

…g, unnecessary conversions

Closes #2136, #2064, #2149, #2148

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@nf-core-bot
Copy link
Copy Markdown
Member

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.5.1.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

FriederikeHanssen and others added 4 commits March 12, 2026 11:25
Resolve conflicts:
- CHANGELOG.md: keep entries from both sides
- fastq_preprocess_gatk/main.nf: keep our fix removing old CRAM_TO_BAM_RECAL block

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The process was removed in #2154, so the sample log output
in usage.md should no longer list it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of always producing CRAM and converting back to BAM:
- Make GATK4_MARKDUPLICATES/GATK4SPARK_MARKDUPLICATES ext.prefix
  conditional on save_output_as_bam (same pattern as APPLYBQSR)
- Emit unified `alignment` channel from bam_markduplicates subworkflows
- Remove CRAM_TO_BAM conversion step at markduplicates stage
- Fix BAM_TO_CRAM_MAPPING ext.when to skip conversion when
  save_output_as_bam is set with skip_markduplicates
- Fix CSV create subworkflows to derive file type from actual
  filenames instead of using fragile .minus(".cram") hack
- Add BAM publishDir patterns for markduplicates configs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove dead BAM_TO_CRAM config block (no process uses this alias)
- Remove unused save_output_as_bam parameter from
  CHANNEL_MARKDUPLICATES_CREATE_CSV and
  CHANNEL_BASERECALIBRATOR_CREATE_CSV (type is now derived
  from filename, making the parameter redundant)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 12, 2026

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit e242509

+| ✅ 224 tests passed       |+
#| ❔  13 tests were ignored |#
!| ❗   7 tests had warnings |!
Details

❗ Test warnings:

  • pipeline_todos - TODO string in base.config: Check the defaults for all processes
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • schema_lint - Input mimetype is missing or empty
  • schema_description - No description provided in schema for parameter: markduplicates_pixel_distance
  • schema_description - No description provided in schema for parameter: gatk_pcr_indel_model

❔ Tests ignored:

✅ Tests passed:

Run details

  • nf-core/tools version 3.5.1
  • Run at 2026-03-20 15:49:05

GATK4_MARKDUPLICATES does not auto-index BAM output (only indexes
when converting to CRAM). Add explicit INDEX_MARKDUPLICATES step
for the BAM path, matching the pattern already used in the spark
variant.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
FriederikeHanssen and others added 2 commits March 12, 2026 16:33
The skip QC/recal/md test now correctly skips BAM_TO_CRAM_MAPPING
when --save_output_as_bam is set, reducing process count from 10 to 9.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- alignment_from_everything: Remove CRAM_TO_BAM, add INDEX_MARKDUPLICATES,
  update file listings and stable_content md5s for BAM output
- alignment_to_fastq: Same structural changes, update multiqc aggregate md5s
- save_output_as_bam: Fix warning field to match CI behavior

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
FriederikeHanssen and others added 2 commits March 20, 2026 16:01
- alignment_from_everything/alignment_to_fastq: Fix .bam.metrics md5s
  (values were extracted from wrong side of CI diff)
- save_output_as_bam: Add missing variant calling snapshot, fix warnings

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GATK4_MARKDUPLICATES .bam.metrics output includes timestamps,
making md5 values change between CI runs. Add to .nftignore
(same as .cram.metrics) and remove from snapshot stable_content.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@maxulysse maxulysse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good but order of the PR in changelog

@gorgitko
Copy link
Copy Markdown

Hi, what's the time estimate for merging & releasing this fix? Thanks!

@FriederikeHanssen
Copy link
Copy Markdown
Contributor Author

I asked the issue authors above to test. @amizeranschi ran into something that i haven't had time to investigate: #2064 it would be good too exclude first if the changes in this PR are causing the stuck issue and/or validate that all the other scenarios are functioning now, if you have time to run any of them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants