Skip to content

Best Practices for Large-Scale ProteinMPNN → FastRelax Runs: Are Per-Backbone Designs Truly Independent? #141

@kimlab-cnu

Description

@kimlab-cnu

Hello,

I am a student studying in the field of proteomics in Korea.

I am trying to apply the ProteinMPNN → threading → Rosetta FastRelax cycle (for cyclic peptide) described in the following 2023 Nature Communications paper: https://www.nature.com/articles/s41467-023-38328-5.

While using this pipeline, I had a couple of questions:

  1. In ProteinMPNN and related models, are the individual tasks (e.g., sequence generation per input backbone/file) treated as statistically independent? Since each run is driven by a random seed, I would like to know whether it is reasonable to regard each design task as an independent sample from a statistical point of view.

  2. If the above assumption holds, I would like to speed up the process and improve efficiency on an HPC cluster. Suppose I have ~10,000 backbones to process. If I split them into four separate jobs (2,500 backbones each) and run these four jobs in parallel, will the combined results be reproducible and effectively equivalent to running all 10,000 backbones in a single job (given the same overall configuration and random seed strategy)?

I am deeply grateful for your kind efforts and important contributions to this research area, and I look forward to your response.

Thank you very much.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions