Best Practices for Large-Scale ProteinMPNN → FastRelax Runs: Are Per-Backbone Designs Truly Independent?

Hello,

I am a student studying in the field of proteomics in Korea.

I am trying to apply the ProteinMPNN → threading → Rosetta FastRelax cycle (for cyclic peptide) described in the following 2023 Nature Communications paper: https://www.nature.com/articles/s41467-023-38328-5. 

While using this pipeline, I had a couple of questions:

1. In ProteinMPNN and related models, are the individual tasks (e.g., sequence generation per input backbone/file) treated as statistically independent? Since each run is driven by a random seed, I would like to know whether it is reasonable to regard each design task as an independent sample from a statistical point of view.

2. If the above assumption holds, I would like to speed up the process and improve efficiency on an HPC cluster. Suppose I have ~10,000 backbones to process. If I split them into four separate jobs (2,500 backbones each) and run these four jobs in parallel, will the combined results be reproducible and effectively equivalent to running all 10,000 backbones in a single job (given the same overall configuration and random seed strategy)?

I am deeply grateful for your kind efforts and important contributions to this research area, and I look forward to your response.

Thank you very much.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best Practices for Large-Scale ProteinMPNN → FastRelax Runs: Are Per-Backbone Designs Truly Independent? #141

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Best Practices for Large-Scale ProteinMPNN → FastRelax Runs: Are Per-Backbone Designs Truly Independent? #141

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions