-
Notifications
You must be signed in to change notification settings - Fork 430
Description
Hello,
I am a student studying in the field of proteomics in Korea.
I am trying to apply the ProteinMPNN → threading → Rosetta FastRelax cycle (for cyclic peptide) described in the following 2023 Nature Communications paper: https://www.nature.com/articles/s41467-023-38328-5.
While using this pipeline, I had a couple of questions:
-
In ProteinMPNN and related models, are the individual tasks (e.g., sequence generation per input backbone/file) treated as statistically independent? Since each run is driven by a random seed, I would like to know whether it is reasonable to regard each design task as an independent sample from a statistical point of view.
-
If the above assumption holds, I would like to speed up the process and improve efficiency on an HPC cluster. Suppose I have ~10,000 backbones to process. If I split them into four separate jobs (2,500 backbones each) and run these four jobs in parallel, will the combined results be reproducible and effectively equivalent to running all 10,000 backbones in a single job (given the same overall configuration and random seed strategy)?
I am deeply grateful for your kind efforts and important contributions to this research area, and I look forward to your response.
Thank you very much.