Running PGS calc on all pgs ids #414
-
|
Hello, I have a few vcf files I'd like to run against all possible pgs ids and am running into some errors. I want to run against all to prevent bias as we can't be sure what our variants may be associated with. The key error I'm stuck on is: Caused by: when running: I'd appreciate some guidance on this, many thanks. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
|
You'd need a lot of compute resources to calculate every PGS in parallel in one workflow run. I think the PGS Catalog now contains several billion genetic variants, so calculating all PGS is a big job! I'd recommend instead to break up PGS IDs into batches of ~100 scores. Batch size depends on your available resources but 100 is a sensible place to start. You could speed this up by submitting each batch as a separate job to a HPC. The documentation for HPCs/big jobs might be helpful. We're working on improving scalability in the next few releases. Also, it's much quicker to use one VCF/plink2 file set containing multiple samples. |
Beta Was this translation helpful? Give feedback.
You'd need a lot of compute resources to calculate every PGS in parallel in one workflow run. I think the PGS Catalog now contains several billion genetic variants, so calculating all PGS is a big job!
I'd recommend instead to break up PGS IDs into batches of ~100 scores.
Batch size depends on your available resources but 100 is a sensible place to start. You could speed this up by submitting each batch as a separate job to a HPC. The documentation for HPCs/big jobs might be helpful.
We're working on improving scalability in the next few releases.
Also, it's much quicker to use one VCF/plink2 file set containing multiple samples.