Running PGS calc on all pgs ids #414

alicegodden · 2025-04-04T10:26:30Z

alicegodden
Apr 4, 2025

Hello,

I have a few vcf files I'd like to run against all possible pgs ids and am running into some errors. I want to run against all to prevent bias as we can't be sure what our variants may be associated with.

The key error I'm stuck on is:
ERROR ~ Error executing process > 'PGSCATALOG_PGSCCALC:PGSCCALC:INPUT_CHECK:COMBINE_SCOREFILES (1)'

Caused by:
Process PGSCATALOG_PGSCCALC:PGSCCALC:INPUT_CHECK:COMBINE_SCOREFILES (1) terminated with an error exit status (135)

when running:
nextflow run pgscatalog/pgscalc
-profile conda
--input samplesheet_SU.csv
--target_build GRCh38
--pgs_id PGS000001,PGS000002,PGS000003,PGS000004,.......all id's comma seperated
-c pgs.config
-work-dir "$WORK_DIR"
--outdir "$RESULTS_DIR/$pgs_id"

I'd appreciate some guidance on this, many thanks.
Alice

Answered by nebfield

Apr 4, 2025

You'd need a lot of compute resources to calculate every PGS in parallel in one workflow run. I think the PGS Catalog now contains several billion genetic variants, so calculating all PGS is a big job!

I'd recommend instead to break up PGS IDs into batches of ~100 scores.

Batch size depends on your available resources but 100 is a sensible place to start. You could speed this up by submitting each batch as a separate job to a HPC. The documentation for HPCs/big jobs might be helpful.

We're working on improving scalability in the next few releases.

Also, it's much quicker to use one VCF/plink2 file set containing multiple samples.

View full answer

nebfield · 2025-04-04T11:19:21Z

nebfield
Apr 4, 2025
Maintainer

You'd need a lot of compute resources to calculate every PGS in parallel in one workflow run. I think the PGS Catalog now contains several billion genetic variants, so calculating all PGS is a big job!

I'd recommend instead to break up PGS IDs into batches of ~100 scores.

Batch size depends on your available resources but 100 is a sensible place to start. You could speed this up by submitting each batch as a separate job to a HPC. The documentation for HPCs/big jobs might be helpful.

We're working on improving scalability in the next few releases.

Also, it's much quicker to use one VCF/plink2 file set containing multiple samples.

1 reply

alicegodden Apr 4, 2025
Author

Thank you for your response, I really appreciate it, I will try running in batches.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running PGS calc on all pgs ids #414

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Running PGS calc on all pgs ids #414

Uh oh!

alicegodden Apr 4, 2025

Replies: 1 comment · 1 reply

Uh oh!

Uh oh!

nebfield Apr 4, 2025 Maintainer

Uh oh!

alicegodden Apr 4, 2025 Author

alicegodden
Apr 4, 2025

Replies: 1 comment 1 reply

nebfield
Apr 4, 2025
Maintainer

alicegodden Apr 4, 2025
Author