-
Notifications
You must be signed in to change notification settings - Fork 1
Add VarScan2 somatic variant caller rule #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,3 @@ | ||
| samp,sample,sample_lane,SQ,RU,EX,LANE,r1_path,r2_path,biological_sex,iddna_uid,concordance_control_path,is_positive_control,is_negative_control,sample_type,merge_single,external_sample_id,instrument,lib_prep,bwa_kmer,tum_nrm_sampleid_match | ||
| RIH0_ANA0-HG002_DBC0_0,RIH0_ANA0-HG002_DBC0_0,RIH0_ANA0-HG002_DBC0_0,DBC0,RIH0,ANA0-HG002,0,.test_data/data/RIH0_ANA0-HG002_DBC0_0.R1.fastq.gz,.test_data/data/RIH0_ANA0-HG002_DBC0_0.R2.fastq.gz,male,na,/fsx/data/genomic_data/organism_annotations/H_sapiens/hg38/controls/giab/snv/v4.2.1/HG2/,true,false,blood,merge,HG002,NOVASEQ,PCR-FREE,19,RIH0_ANA0-HG002 | ||
| RIH0_ANA0-HG002_DBC0_0,RIH0_ANA0-HG002_DBC0_0,RIH0_ANA0-HG002_DBC0_0,DBC0,RIH0,ANA0-HG002,0,.test_data/data/RIH0_ANA0-HG002_DBC0_0.R1.fastq.gz,.test_data/data/RIH0_ANA0-HG002_DBC0_0.R2.fastq.gz,male,na,/fsx/data/genomic_data/organism_annotations/H_sapiens/hg38/controls/giab/snv/v4.2.1/HG2/,true,false,normal,merge,HG002,NOVASEQ,PCR-FREE,19,RIH0_ANA0-HG002 | ||
| RIH0_ANA0-HG002_DBC1_0,RIH0_ANA0-HG002_DBC1_0,RIH0_ANA0-HG002_DBC1_0,DBC1,RIH0,ANA0-HG002,0,.test_data/data/RIH0_ANA0-HG002_DBC0_0.R1.fastq.gz,.test_data/data/RIH0_ANA0-HG002_DBC0_0.R2.fastq.gz,male,na,/fsx/data/genomic_data/organism_annotations/H_sapiens/hg38/controls/giab/snv/v4.2.1/HG2/,false,false,tumor,merge,HG002,NOVASEQ,PCR-FREE,19,RIH0_ANA0-HG002 |
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,223 @@ | ||||||
| import sys | ||||||
| import os | ||||||
|
|
||||||
| ##### VarScan2 | ||||||
| # --------------------------- | ||||||
|
|
||||||
|
|
||||||
| def get_vscn_chrm(wildcards): | ||||||
| pchr = "" # prefix handled already | ||||||
| ret_str = "" | ||||||
| sl = wildcards.vscnchrm.replace('chr', '').split("-") | ||||||
| sl2 = wildcards.vscnchrm.replace('chr', '').split("~") | ||||||
|
|
||||||
| if len(sl2) == 2: | ||||||
| ret_str = pchr + wildcards.vscnchrm | ||||||
| elif len(sl) == 1: | ||||||
| ret_str = pchr + sl[0] | ||||||
| elif len(sl) == 2: | ||||||
| start = int(sl[0]) | ||||||
| end = int(sl[1]) | ||||||
| while start <= end: | ||||||
| ret_str = str(ret_str) + " " + pchr + str(start) | ||||||
| start = start + 1 | ||||||
| else: | ||||||
| raise Exception( | ||||||
| "varscan2 chunks can only be one contiguous range per chunk: e.g., 1-4 with the non numerical chromosomes assigned 23=X, 24=Y, 25=MT" | ||||||
| ) | ||||||
|
|
||||||
| return ret_mod_chrm(ret_str) | ||||||
|
|
||||||
|
Comment on lines
+8
to
+30
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [P1] Convert mitochondrial contig name for hg38 correctly The chromosome helper Useful? React with 👍 / 👎. |
||||||
|
|
||||||
| rule varscan2: | ||||||
| wildcard_constraints: | ||||||
| sample=TUMORS_REGEX | ||||||
| input: | ||||||
| tumor_cram=get_somcall_tumor_cram, | ||||||
| tumor_crai=get_somcall_tumor_crai, | ||||||
| normal_cram=get_somcall_normal_cram, | ||||||
| normal_crai=get_somcall_normal_crai, | ||||||
| ref_fa=lambda wc: config["supporting_files"]["files"]["huref"]["fasta"]["name"], | ||||||
| ref_fai=lambda wc: config["supporting_files"]["files"]["huref"]["fasta"]["name"] + ".fai", | ||||||
| d=MDIR + "{sample}/align/{alnr}/snv/vscn/vcfs/{vscnchrm}/{sample}.ready", | ||||||
| output: | ||||||
| vcf=MDIR + "{sample}/align/{alnr}/snv/vscn/vcfs/{vscnchrm}/{sample}.{alnr}.vscn.{vscnchrm}.snv.vcf", | ||||||
| log: | ||||||
| MDIR + "{sample}/align/{alnr}/snv/vscn/log/{sample}.{alnr}.vscn.{vscnchrm}.snv.log", | ||||||
| threads: config['varscan2']['threads'] | ||||||
| container: | ||||||
| config['varscan2']['container'] | ||||||
| priority: 45 | ||||||
| resources: | ||||||
| vcpu=config['varscan2']['threads'], | ||||||
| threads=config['varscan2']['threads'], | ||||||
| partition=config['varscan2']['partition'], | ||||||
| mem_mb=config['varscan2']['mem_mb'], | ||||||
| benchmark: | ||||||
| repeat( | ||||||
| MDIR + "{sample}/benchmarks/{sample}.{alnr}.vscn.{vscnchrm}.bench.tsv", | ||||||
| 0 if 'bench_repeat' not in config['varscan2'] else config['varscan2']['bench_repeat'], | ||||||
| ) | ||||||
| params: | ||||||
| vchrm=get_vscn_chrm, | ||||||
| cluster_sample=ret_sample, | ||||||
| huref=config["supporting_files"]["files"]["huref"]["fasta"]["name"], | ||||||
| cpre="" if "b37" == config['genome_build'] else "chr", | ||||||
| mito_code="MT" if "b37" == config['genome_build'] else "M", | ||||||
| shell: | ||||||
| r""" | ||||||
| set -euo pipefail | ||||||
| ulimit -n 65536 || true | ||||||
|
|
||||||
| vchr=$(echo {params.cpre}{params.vchrm} | sed 's/~/\\:/g' | sed 's/23\\:/X\\:/' | sed 's/24\\:/Y\\:/' | sed 's/25\\:/{params.mito_code}\\:/') | ||||||
| vchr=${vchr%:} | ||||||
| IFS=':' read -r vcontig vstart vend <<< "$vchr" | ||||||
| if [ -z "${vend:-}" ]; then | ||||||
| vstart=0 | ||||||
| vend=$(awk -v c="$vcontig" '$1==c{print $2; exit}' {params.huref}.fai) | ||||||
| fi | ||||||
| region="$vcontig:$vstart-$vend" | ||||||
|
|
||||||
| mkdir -p "$(dirname {output.vcf})" | ||||||
| outbase="$(dirname {output.vcf})/{wildcards.sample}.{wildcards.alnr}.vscn.{wildcards.vscnchrm}" | ||||||
|
|
||||||
| samtools mpileup -f {params.huref} -q 1 -B -d 1000000 -r $region {input.normal_cram} {input.tumor_cram} \ | ||||||
| | varscan somatic --mpileup 1 --output-vcf 1 /dev/stdin $outbase >> {log} 2>&1 | ||||||
|
|
||||||
| bcftools concat -a -O v ${outbase}.snp.vcf ${outbase}.indel.vcf > {output.vcf} 2>> {log} | ||||||
| rm -f ${outbase}.snp.vcf ${outbase}.indel.vcf | ||||||
| """ | ||||||
|
|
||||||
|
|
||||||
| rule varscan2_sort_index_chunk_vcf: | ||||||
| wildcard_constraints: | ||||||
| sample=TUMORS_REGEX | ||||||
| input: | ||||||
| vcf=MDIR + "{sample}/align/{alnr}/snv/vscn/vcfs/{vscnchrm}/{sample}.{alnr}.vscn.{vscnchrm}.snv.vcf", | ||||||
| priority: 46 | ||||||
| output: | ||||||
| vcfsort=MDIR + "{sample}/align/{alnr}/snv/vscn/vcfs/{vscnchrm}/{sample}.{alnr}.vscn.{vscnchrm}.snv.sort.vcf", | ||||||
| vcfgz=MDIR + "{sample}/align/{alnr}/snv/vscn/vcfs/{vscnchrm}/{sample}.{alnr}.vscn.{vscnchrm}.snv.sort.vcf.gz", | ||||||
| vcftbi=MDIR + "{sample}/align/{alnr}/snv/vscn/vcfs/{vscnchrm}/{sample}.{alnr}.vscn.{vscnchrm}.snv.sort.vcf.gz.tbi", | ||||||
| conda: | ||||||
| "../envs/vanilla_v0.1.yaml" | ||||||
| log: | ||||||
| MDIR + "{sample}/align/{alnr}/snv/vscn/vcfs/{vscnchrm}/log/{sample}.{alnr}.vscn.{vscnchrm}.snv.sort.vcf.gz.log", | ||||||
| resources: | ||||||
| vcpu=4, | ||||||
| threads=4, | ||||||
| partition=config['varscan2'].get('partition_other', config['varscan2']['partition']), | ||||||
| params: | ||||||
| cluster_sample=ret_sample, | ||||||
| threads: 4 | ||||||
| shell: | ||||||
| """ | ||||||
| bedtools sort -header -i {input.vcf} > {output.vcfsort} 2>> {log}; | ||||||
| bgzip {output.vcfsort} >> {log} 2>&1; | ||||||
| touch {output.vcfsort}; | ||||||
|
||||||
| touch {output.vcfsort}; |
Copilot
AI
Sep 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable latency_wait is used in shell commands but is not defined in the rule parameters. This will cause the shell command to fail.
Copilot
AI
Sep 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable latency_wait is used in shell commands but is not defined in the rule parameters. This will cause the shell command to fail.
Copilot
AI
Sep 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using dirname on multiple space-separated paths in {output} will not work correctly. The shell will interpret this as dirname path1 path2 path3 which is invalid. Use a loop or process each output file individually.
| mkdir -p $(dirname {output} ); | |
| for f in {output}; do mkdir -p "$(dirname "$f")"; done ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function
ret_mod_chrmis called but not defined in this file or imported. This will cause a NameError when the function is executed.