Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
c7701d4
update dyanmic truth paths
nikellepetrillo Mar 19, 2025
153c27f
update buckekt names
nikellepetrillo Mar 19, 2025
9913d47
update buckekt names
nikellepetrillo Mar 19, 2025
8d2b80c
update buckekt names
nikellepetrillo Mar 19, 2025
2d4021c
update buckekt names
nikellepetrillo Mar 19, 2025
ed60c07
update buckekt names
nikellepetrillo Mar 19, 2025
316e448
add warp-pipeline-dev as billing project
nikellepetrillo Mar 19, 2025
244fece
try to fix results path
nikellepetrillo Mar 19, 2025
aaec3d4
try to fix results path
nikellepetrillo Mar 19, 2025
1c801dd
try to fix results path
nikellepetrillo Mar 19, 2025
2abe71f
fix truth path
nikellepetrillo Mar 20, 2025
c6d2b26
fix truth path
nikellepetrillo Mar 20, 2025
7a3de2d
fix truth path
nikellepetrillo Mar 20, 2025
10663a2
fix truth path
nikellepetrillo Mar 20, 2025
94647ad
add --billing-project=warp-pipeline-dev
nikellepetrillo Mar 20, 2025
e031748
--billing-project=terra-f8e3de20
nikellepetrillo Mar 20, 2025
daba981
cleaning up extra debugging code
nikellepetrillo Mar 20, 2025
92e3742
cleaning up extra debugging code
nikellepetrillo Mar 20, 2025
b388e0e
Merge branch 'develop' into np_update_truth_paths
nikellepetrillo Mar 20, 2025
6707aee
Merge branch 'develop' into np_update_truth_paths
nikellepetrillo Mar 20, 2025
7a4e1a0
update slideseq inputs to private bucket to test out dynamic truth fu…
nikellepetrillo Mar 21, 2025
100b0e4
Merge remote-tracking branch 'origin/np_update_truth_paths' into np_u…
nikellepetrillo Mar 21, 2025
2eef91b
revert
nikellepetrillo Mar 21, 2025
755636f
update optimus sci inputs
nikellepetrillo Mar 21, 2025
551e1e9
update plumbing multi snss2
nikellepetrillo Apr 9, 2025
8481de5
update slideseq sci and plumbing
nikellepetrillo Apr 9, 2025
5346ce3
update snm3c
nikellepetrillo Apr 14, 2025
e37e1fd
Merge branch 'develop' into np_snm3c_slideseq_ss2sn_update_inputs
nikellepetrillo Apr 14, 2025
8c35fd0
update snm3c
nikellepetrillo Apr 14, 2025
334dfba
updates
nikellepetrillo Apr 15, 2025
0f93c2d
Merge branch 'develop' into np_snm3c_slideseq_ss2sn_update_inputs
nikellepetrillo Apr 15, 2025
03a519d
Merge branch 'develop' into np_update_truth_paths
nikellepetrillo Apr 15, 2025
5516cb1
revert
nikellepetrillo Apr 15, 2025
71e54f8
Merge pull request #1565 from broadinstitute/np_snm3c_slideseq_ss2sn_…
nikellepetrillo Apr 15, 2025
755170e
update sn3mc
nikellepetrillo Apr 16, 2025
0fe8557
update sn3mc
nikellepetrillo Apr 16, 2025
5faaaac
update sn3mc
nikellepetrillo Apr 16, 2025
9a6175e
update sn3mc
nikellepetrillo Apr 16, 2025
c8fa692
Merge branch 'develop' into np_update_truth_paths
nikellepetrillo Apr 16, 2025
96809d1
fix paired tag
nikellepetrillo Apr 17, 2025
03c109b
Merge remote-tracking branch 'origin/np_update_truth_paths' into np_u…
nikellepetrillo Apr 17, 2025
3352429
fix paired tag
nikellepetrillo Apr 17, 2025
29cacfa
fix atac
nikellepetrillo Apr 17, 2025
66a7f65
update sci snm3c
nikellepetrillo Apr 17, 2025
ec20e2f
update sci snm3c
nikellepetrillo Apr 17, 2025
f3dc83f
update atac
nikellepetrillo Apr 18, 2025
56ab9d6
update imputation and imp. beagle
nikellepetrillo Apr 18, 2025
1ece649
Merge pull request #1577 from broadinstitute/np_imputation_update_tru…
nikellepetrillo Apr 22, 2025
5f4c5f3
update paired tag
nikellepetrillo Apr 22, 2025
52b7bc6
Merge branch 'develop' into np_update_truth_paths
nikellepetrillo Apr 22, 2025
c4fb6e9
update imputation
nikellepetrillo Apr 22, 2025
7c41901
Merge remote-tracking branch 'origin/np_update_truth_paths' into np_u…
nikellepetrillo Apr 22, 2025
24a1584
update imputation
nikellepetrillo Apr 22, 2025
837fc79
Np update wgs np update truth paths (#1588)
nikellepetrillo May 27, 2025
487fe81
Migrate ExomegermlineSS inputs (#1595)
nikellepetrillo Jun 4, 2025
ab5c90f
update sample name maps
nikellepetrillo Jul 2, 2025
5015700
revert bge
nikellepetrillo Jul 3, 2025
282e757
revert gather_vcfs_high_memory.json
nikellepetrillo Jul 4, 2025
7c57915
revert gather_vcfs_high_memory.json
nikellepetrillo Jul 7, 2025
6b1d0b3
fix bge sci
nikellepetrillo Jul 9, 2025
490b171
update ug pipelines
nikellepetrillo Jul 9, 2025
3fbb03b
Merge pull request #1627 from broadinstitute/np_jg_np_update_truth_paths
nikellepetrillo Jul 10, 2025
39d262d
add gs://gatk-best-practices to public bucket identifiers
nikellepetrillo Jul 11, 2025
c8a89ef
forgot some instances of broad-gotc
nikellepetrillo Jul 14, 2025
1066cdd
fix VerifyNA12878.wdl
nikellepetrillo Jul 14, 2025
4ceda45
Merge pull request #1628 from broadinstitute/ultima_np_update_input_p…
nikellepetrillo Jul 15, 2025
8aa15f2
Merge pull request #1631 from broadinstitute/np_jg_input_fix
nikellepetrillo Jul 15, 2025
4ac288d
np update Reprocessing input paths (#1636)
nikellepetrillo Jul 25, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 14 additions & 2 deletions .dockstore.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,10 @@ workflows:
subclass: WDL
primaryDescriptorPath: /pipelines/broad/dna_seq/germline/joint_genotyping/reblocking/ReblockGVCF.wdl

- name: rnaseq_aou
subclass: WDL
primaryDescriptorPath: /all_of_us/rna_seq/rnaseq_aou.wdl

- name: RNAWithUMIsPipeline
subclass: WDL
primaryDescriptorPath: /pipelines/broad/rna_seq/RNAWithUMIsPipeline.wdl
Expand All @@ -90,7 +94,7 @@ workflows:

- name: SlideTags
subclass: WDL
primaryDescriptorPath: /beta-pipelines/skylab/slidetags/SlideTags.wdl
primaryDescriptorPath: /pipelines/skylab/slidetags/SlideTags.wdl

- name: snm3C-seq
subclass: WDL
Expand Down Expand Up @@ -144,6 +148,10 @@ workflows:
subclass: WDL
primaryDescriptorPath: /verification/test-wdls/TestPairedTag.wdl

- name: TestPeakCalling
subclass: WDL
primaryDescriptorPath: /verification/test-wdls/TestPeakCalling.wdl

- name: TestReblockGVCF
subclass: WDL
primaryDescriptorPath: /verification/test-wdls/TestReblockGVCF.wdl
Expand All @@ -156,6 +164,10 @@ workflows:
subclass: WDL
primaryDescriptorPath: /verification/test-wdls/TestSlideSeq.wdl

- name: TestSlideTags
subclass: WDL
primaryDescriptorPath: /verification/test-wdls/TestSlideTags.wdl

- name: Testsnm3C
subclass: WDL
primaryDescriptorPath: /verification/test-wdls/Testsnm3C.wdl
Expand Down Expand Up @@ -200,7 +212,7 @@ workflows:
subclass: WDL
primaryDescriptorPath: /pipelines/broad/dna_seq/germline/variant_calling/VariantCalling.wdl

- name: vds_to_vcf.wdl
- name: vds_to_vcf
subclass: WDL
primaryDescriptorPath: /all_of_us/ancestry/vds_to_vcf.wdl

Expand Down
65 changes: 65 additions & 0 deletions .github/workflows/test_peakcalling.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
name: Test PeakCalling

# Controls when the workflow will run
on:
pull_request:
branches: [ "develop", "staging", "master" ]
# Only run if files in these paths changed:
####################################
# SET PIPELINE SPECIFIC PATHS HERE #
####################################
paths:
# anything in the pipelines folder
- 'pipelines/skylab/peak_calling/**'
# tasks from the pipeline WDL and their dependencies
- 'tasks/broad/Utilities.wdl'
# verification WDL and its dependencies
- 'verification/VerifyPeakCalling.wdl'
- 'verification/VerifyTasks.wdl'
# test WDL and its dependencies
- 'verification/test-wdls/TestPeakCalling.wdl'
- 'tasks/broad/TerraCopyFilesFromCloudToCloud.wdl'
# this file, the subworkflow file, and the firecloud_api script
- '.github/workflows/test_peakcalling.yml'
- '.github/workflows/warp_test_workflow.yml'
- 'scripts/firecloud_api/firecloud_api.py'


# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
inputs:
useCallCache:
description: 'Use call cache (default: true)'
required: false
default: "true"
updateTruth:
description: 'Update truth files (default: false)'
required: false
default: "false"
testType:
description: 'Specify the type of test (Plumbing or Scientific)'
required: false
type: choice
options:
- Plumbing
- Scientific
truthBranch:
description: 'Specify the branch for truth files (default: master)'
required: false
default: "master"


jobs:
TestPeakCalling:
uses: ./.github/workflows/warp_test_workflow.yml
with:
pipeline_name: TestPeakCalling
dockstore_pipeline_name: PeakCalling
pipeline_dir: pipelines/skylab/peak_calling
use_call_cache: ${{ github.event.inputs.useCallCache || 'true' }}
update_truth: ${{ github.event.inputs.updateTruth || 'false' }}
test_type: ${{ github.event.inputs.testType }}
truth_branch: ${{ github.event.inputs.truthBranch || 'master' }}
secrets:
PDT_TESTER_SA_B64: ${{ secrets.PDT_TESTER_SA_B64 }}
DOCKSTORE_TOKEN: ${{ secrets.DOCKSTORE_TOKEN }}
68 changes: 68 additions & 0 deletions .github/workflows/test_slidetags.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
name: Test Slide Tags

# Controls when the workflow will run
on:
pull_request:
branches: [ "develop", "staging", "master" ]
# Only run if files in these paths changed:
####################################
# SET PIPELINE SPECIFIC PATHS HERE #
####################################
paths:
# anything in the pipelines folder
- 'pipelines/skylab/slidetags/**'
# tasks from the pipeline WDL and their dependencies
- 'tasks/skylab/StarAlign.wdl'
- 'tasks/skylab/Metrics.wdl'
- 'tasks/skylab/H5adUtils.wdl'
- 'tasks/skylab/CheckInputs.wdl'
- 'tasks/skylab/MergeSortBam.wdl'
- 'tasks/broad/Utilities.wdl'
# verification WDL and its dependencies
- 'verification/VerifySlideTags.wdl'
- 'verification/VerifyTasks.wdl'
# test WDL and its dependencies
- 'verification/test-wdls/TestSlideTags.wdl'
- 'tasks/broad/TerraCopyFilesFromCloudToCloud.wdl'
# this file, the subworkflow file, and the firecloud_api script
- '.github/workflows/test_slidetags.yml'
- '.github/workflows/warp_test_workflow.yml'
- 'scripts/firecloud_api/firecloud_api.py'

# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
inputs:
useCallCache:
description: 'Use call cache (default: true)'
required: false
default: "true"
updateTruth:
description: 'Update truth files (default: false)'
required: false
default: "false"
testType:
description: 'Specify the type of test (Plumbing or Scientific)'
required: false
type: choice
options:
- Plumbing
- Scientific
truthBranch:
description: 'Specify the branch for truth files (default: master)'
required: false
default: "master"

jobs:
TestSlideTags:
uses: ./.github/workflows/warp_test_workflow.yml
with:
pipeline_name: TestSlideTags
dockstore_pipeline_name: SlideTags
pipeline_dir: pipelines/skylab/slidetags
use_call_cache: ${{ github.event.inputs.useCallCache || 'true' }}
update_truth: ${{ github.event.inputs.updateTruth || 'false' }}
test_type: ${{ github.event.inputs.testType }}
truth_branch: ${{ github.event.inputs.truthBranch || 'master' }}
secrets:
PDT_TESTER_SA_B64: ${{ secrets.PDT_TESTER_SA_B64 }}
DOCKSTORE_TOKEN: ${{ secrets.DOCKSTORE_TOKEN }}
10 changes: 5 additions & 5 deletions .github/workflows/warp_test_workflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,7 @@ jobs:
done

echo "Error: The Dockstore Commit Hash does not match the GitHub Commit Hash after 15 minutes of retries!"

exit 1
env:
GITHUB_COMMIT_HASH: ${{ env.GITHUB_COMMIT_HASH }}
Expand Down Expand Up @@ -249,19 +250,18 @@ jobs:
TEST_TYPE="${{ env.testType }}"
INPUTS_DIR="${{ inputs.pipeline_dir }}/test_inputs/$TEST_TYPE"
echo "Running tests with test type: $TEST_TYPE"

TRUTH_PATH="gs://broad-gotc-test-storage/${{ inputs.dockstore_pipeline_name }}/truth/$(echo "$TEST_TYPE" | tr '[:upper:]' '[:lower:]')/$TRUTH_BRANCH"
echo "Truth path: $TRUTH_PATH"

RESULTS_PATH="gs://pd-test-results/${{ inputs.dockstore_pipeline_name }}/results/$CURRENT_TIME"


# Submit all jobs first and store their submission IDs
for input_file in "$INPUTS_DIR"/*.json; do
test_input_file=$(python3 scripts/firecloud_api/UpdateTestInputs.py --truth_path "$TRUTH_PATH" \
test_input_file=$(python3 scripts/firecloud_api/UpdateTestInputs.py \
--results_path "$RESULTS_PATH" \
--inputs_json "$input_file" \
--update_truth "$UPDATE_TRUTH_BOOL" \
--branch_name "$BRANCH_NAME" )
--branch_name "$TRUTH_BRANCH" \
--dockstore_pipeline_name "${{ inputs.dockstore_pipeline_name }}" )
echo "Uploading the test input file: $test_input_file"

# Create the submission_data.json file for this input_file
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@

### WDL Analysis Research Pipelines
### Warp Analysis Research Pipelines

The WDL Analysis Research Pipelines (WARP) repository is a collection of cloud-optimized pipelines for processing biological data from the Broad Institute Data Sciences Platform and collaborators.
The Warp Analysis Research Pipelines (WARP) repository is a collection of cloud-optimized pipelines for processing biological data from the Broad Institute Data Sciences Platform and collaborators.

WARP provides robust, standardized data analysis for the Broad Institute Genomics Platform and large consortia like the Human Cell Atlas and the BRAIN Initiative. WARP pipelines are rigorously scientifically validated, high scale, reproducible and open source, released under the [BSD 3-Clause license](https://github.com/broadinstitute/warp/blob/master/LICENSE).

Expand Down
2 changes: 1 addition & 1 deletion all_of_us/ancestry/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Key characteristics:
#### Step 3. Generate Index Files**
- Tabix index files (`.tbi`) are created alongside each VCF output.

### Step 4. Create Lists of Outputs (task: `create_fofn`)
#### Step 4. Create Lists of Outputs (task: `create_fofn`)
- Two flat text files (`.fofn1.txt`, `.fofn2.txt`) are generated listing all full VCFs and index files.

#### Step 5. Output Final Files
Expand Down
6 changes: 6 additions & 0 deletions all_of_us/ancestry/run_ancestry.changelog.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
# aou-9.0.0
2025-05-23 (Date of Last Commit)

* Updated the gnomad metadata tsv from gnomad.genomes.v3.1.hgdp_1kg_subset.sample_meta.tsv.gz to gs://gcp-public-data--gnomad/release/3.1/secondary_analyses/hgdp_1kg_v2/metadata_and_qc/gnomad_meta_updated.tsv
* Updated which column the workflow uses for population labels; it now applies project_meta.project_pop

# aou-8.0.0
2025-04-16 (Date of Last Commit)

Expand Down
8 changes: 4 additions & 4 deletions all_of_us/ancestry/run_ancestry.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,9 @@ workflow run_ancestry {
Int num_pcs=16
}

File hgdp_metadata_file = select_first([hgdp_metadata_file_in, "gs://gcp-public-data--gnomad/release/3.1/vcf/genomes/gnomad.genomes.v3.1.hgdp_1kg_subset.sample_meta.tsv.gz"])
File hgdp_metadata_file = select_first([hgdp_metadata_file_in, "gs://gcp-public-data--gnomad/release/3.1/secondary_analyses/hgdp_1kg_v2/metadata_and_qc/gnomad_meta_updated.tsv"])
Float other_cutoff = select_first([other_cutoff_in, 0.75])
String pipeline_version = "aou_8.0.0"
String pipeline_version = "aou_9.0.0"

# Train the model on the intersection sites (full version that includes the samples)
call create_hw_pca_training {
Expand Down Expand Up @@ -119,8 +119,8 @@ task create_hw_pca_training {
eigenvalues_training, scores_training, loadings_training = get_PCA_scores("~{full_bgz}")

# Apply any custom processing to the population labels from the training data
pop_label_pd = metadata_pd[['s', 'population_inference.pop']]
pop_label_pd['pop_label'] = metadata_pd['population_inference.pop'].apply(collapse_fin_to_eur)
pop_label_pd = metadata_pd[['s', 'project_meta.project_pop']]
pop_label_pd['pop_label'] = metadata_pd['project_meta.project_pop'].apply(collapse_fin_to_eur)

# Join the labels to the training PCA feature set.
pop_label_ht = hl.Table.from_pandas(pop_label_pd).key_by('s')
Expand Down
Loading