Skip to content
Discussion options

You must be logged in to vote

You'd need a lot of compute resources to calculate every PGS in parallel in one workflow run. I think the PGS Catalog now contains several billion genetic variants, so calculating all PGS is a big job!

I'd recommend instead to break up PGS IDs into batches of ~100 scores.

Batch size depends on your available resources but 100 is a sensible place to start. You could speed this up by submitting each batch as a separate job to a HPC. The documentation for HPCs/big jobs might be helpful.

We're working on improving scalability in the next few releases.

Also, it's much quicker to use one VCF/plink2 file set containing multiple samples.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@alicegodden
Comment options

Answer selected by nebfield
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants