-
Notifications
You must be signed in to change notification settings - Fork 0
Closed
Description
Description
Currently, the Denver pipeline indexes BWA and samtools references at runtime for each pipeline execution. This task implements pre-indexing to improve pipeline startup time and reduce redundant computation.
Current Behavior
BWA_INDEXruns for each serotype reference FASTA (line 77-79 in workflows/denver.nf)SAMTOOLS_FAIDXruns for each serotype reference FASTA (line 79-83 in subworkflows/local/denv_serotype_analysis/main.nf)- Indexes are recreated on every pipeline run
Proposed Solution
-
Create a pre-indexing script (bin/preindex_references.sh) that:
- Iterates through all FASTA files in assets directory
- Runs BWA index for each reference
- Runs samtools faidx for each reference
- Stores indexes appropriately:
- Single-file indexes (*.fai): stored directly alongside FASTA
- Multi-file indexes (BWA): stored in per-reference subdirectories (e.g., DENV1/DENV1.fasta + BWA index files)
-
Update pipeline configuration:
- Add optional params for pre-built indexes
use_prebuilt_bwa_index: boolean flag (default: false)use_prebuilt_fai: boolean flag (default: false)- When enabled, skip indexing modules and load pre-built indexes from assets
-
Update workflows:
- Modify denver.nf to conditionally skip BWA_INDEX
- Modify denv_serotype_analysis/main.nf to conditionally skip SAMTOOLS_FAIDX
- Load pre-built indexes from assets directory when flags are enabled
-
Update nextflow_schema.json with new parameters
Acceptance Criteria
- Pre-indexing script created in bin/
- Script successfully indexes all 6 serotype references (DENV1-4 + 2 sylvatic)
- Configuration parameters added and validated
- Pipeline successfully runs with pre-built indexes enabled
- Pipeline still works with runtime indexing (default behavior)
- Documentation updated (README, parameter descriptions)
LoE Estimate
2-3 hours
- Script implementation: 45 minutes
- Pipeline modifications: 60 minutes
- Configuration/schema updates: 30 minutes
- Testing both modes: 45 minutes
Technical Details
BWA Index Output
BWA creates multiple index files for a reference:
- reference.fasta.amb
- reference.fasta.ann
- reference.fasta.bwt
- reference.fasta.pac
- reference.fasta.sa
Storage strategy: Create DENV1/ directory containing FASTA + all index files
Samtools faidx Output
Samtools creates a single index file:
- reference.fasta.fai
Storage strategy: Store directly alongside FASTA in assets directory
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels