PyRosettaCluster is a Python-based framework for reproducible, high-throughput job distribution of user-defined PyRosetta protocols efficiently parallelized on the user's local computer, high-performance computing (HPC) cluster, or elastic cloud computing infrastructure with available compute resources.
- Creating Environments for PyRosettaCluster
- Running PyRosettaCluster Simulations
- Recreating Environments for PyRosettaCluster
- Reproducing PyRosettaCluster Simulations
The PyRosettaCluster framework supports running reproducible PyRosetta simulations from reproducible virtual environments created with Conda, Mamba, uv, and Pixi environment managers. Please install PyRosetta (with cxx11thread.serialization support) and the following packages to get started (and see the envs directory for template environment configuration files)!
attrs,billiard,blosc,cloudpickle,cryptography,dask,dask-jobqueue,distributed,gitpython,numpy,pandas,python-xz,scipy,traitlets
Official Full List of Packages
Important
It is recommended to install the required packages individually when using the uv environment manager.
Explanation: If using the uv environment manager, it is highly recommended to avoid using the PyPI pyrosetta-distributed metapackage to install the required PyRosettaCluster framework packages (which installs subpackages using pip install ..., so the subpackages will not get registered as installed in the exported uv environment file). Instead, please install the required pyrosetta.distributed framework packages individually to register them properly in the uv environment. See the envs/uv directory for template environment configuration files.
Important
It is recommended to install PyRosetta via the quarterly builds (instead of the PyPI pyrosetta-installer package) when using the uv environment manager.
Explanation: If using the uv environment manager, the PyRosetta quarterly builds enable long-term reproducibility of virtual environments containing the pyrosetta package; just include the --find-links (or -f) flag to indicate the links search path to the PyRosetta cxx11thread.serialization wheels:
-
For U.S. west mirror: https://west.rosettacommons.org/pyrosetta/quarterly/release.cxx11thread.serialization/
-
For U.S. east mirror: https://graylab.jhu.edu/download/PyRosetta4/archive/release-quarterly/release.cxx11thread.serialization/
Example #1:
uv pip install pyrosetta -f https://west.rosettacommons.org/pyrosetta/quarterly/release.cxx11thread.serialization/
Example #2:
To specify a specific version:
uv add pyrosetta==2026.3+releasequarterly.5e498f1409 -f https://west.rosettacommons.org/pyrosetta/quarterly/release.cxx11thread.serialization/
Example #3:
Alternatively, add the following to your uv project's pyproject.toml file:
[tool.uv]
find-links = ["https://west.rosettacommons.org/pyrosetta/quarterly/release.cxx11thread.serialization/"]
Then run uv add pyrosetta or uv pip install pyrosetta to get the most recent pyrosetta package quarterly release.
Example #4:
Alternatively, add the following to your system's uv.toml configuration file (however, this only works with uv pip install pyrosetta syntax, and not uv add pyrosetta syntax):
[pip]
find-links = ["https://west.rosettacommons.org/pyrosetta/quarterly/release.cxx11thread.serialization/"]
Then run uv pip install pyrosetta to get the most recent pyrosetta package quarterly release.
Further details: Installing PyRosetta via the PyPI pyrosetta-installer package into a uv project is not recommended because it installs the pyrosetta package using pip install ... (instead of uv pip install ...), so pyrosetta will not get registered as installed in the exported environment file. Furthermore, when recreating a uv environment (see below) using the PyPI pyrosetta-installer package, the syntax does not allow specifying an exact PyRosetta version (automatically defaulting to the latest published PyRosetta version), and therefore installing the correct PyRosetta version in the recreated uv environment depends fortuitously on a prompt uv environment recreation after the original uv environment creation (typically within the same week or so). For this reason, Pixi, Conda, and Mamba are the preferred environment managers when needing the most recent PyRosetta weekly builds, and when using uv the PyRosetta quarterly builds are highly recommended for long-term reproducibility of virtual environments used for reproducible PyRosettaCluster simulations.
Please see the official PyRosettaCluster documentation for all of the configuration options.
Caution
The PyRosettaCluster framework uses the cloudpickle module, which can lead to arbitrary code execution resulting in a critical security vulnerability. Only run (1) with user-defined PyRosetta protocols from trusted sources, and (2) behind a trusted private network segment (unless running on a local workstation).
Explanation: A primary feature of the PyRosettaCluster framework is that arbitrary user-defined PyRosetta protocols are pickled, sent over a network, and unpickled, which allows the user to run customized macromolecular modeling and design workflows. Unless running solely on a local network, it is highly recommended to operate PyRosettaCluster behind a trusted private network segment (i.e., a firewall) or setup Dask's TLS security framework between network endpoints for authenticated and encrypted transmission of data.
Solution: In the PyRosettaCluster framework, there are two easy ways to setup Dask's TLS communication:
(1) Use the PyRosettaCluster(security=True) option to invoke the cryptography package to automatically generate a distributed.Security.temporary object on-the-fly through the dask or dask-jobqueue package.
(2) Pre-generate a distributed.Security object using OpenSSL (instead of the cryptography package) and pass it to the security keyword argument of the PyRosettaCluster framework. See the pyrosetta.distributed.cluster.generate_dask_tls_security function docstring for more information.
Example:
security = pyrosetta.distributed.cluster.generate_dask_tls_security()
PyRosettaCluster(security=security)
Important
The PyRosettaCluster framework is a tool for reproducible computational macromolecular modeling and design. It is up to the user to define their PyRosetta protocols with reproducibility in mind – meaning user-defined PyRosetta protocols ought to be deterministic:
(1) Set seeds for any impure functions (i.e., non-deterministic functions) implemented.
- Pseudo-random seeds can either (i) be hard-coded into PyRosetta protocols, (ii) distributed as values of keys defined in a user-defined task dictionary, or (iii) be dynamically set based on each PyRosetta protocol's automatically-assigned random seed accessible through each PyRosetta protocol's dictionary of keyword arguments via the value of the
PyRosettaCluster_seedkey (e.g.,kwargs["PyRosettaCluster_seed"]).
(2) If impure functions cannot be made pure through controlling the underlying randomness, please do not rely on them to update the Pose in meaningful ways.
- i.e., a randomly-named score key might be alright, but not randomly selecting the number of
Poseconformational updates.
(3) If implementing third-party software applications inside of PyRosetta protocols that do not support determinism, please ask the developers of these applications to support determinism in their software.
In general, the determinism can be (and ought to be) strictly controlled when developing PyRosetta protocols. Note that the PyRosettaCluster framework can still be used as a simple job distributor even if PyRosetta protocol determinism is impossible for a specific application.
Important
If using the uv environment manager, please remember to set the UV_PROJECT environment variable to the uv project root directory, or run the PyRosettaCluster simulation from the uv project root directory, in order for the PyRosettaCluster framework to automatically detect and cache the uv project's pyproject.toml file for environment configuration reproducibility.
Tip
If using the uv environment manager with PyRosetta version 2026.3+releasequarterly.5e498f1409 and custom/experimental libraries built from source distributions (sdists) instead of wheels, it may be helpful to cache the environment's uv.lock file contents for accounting or debugging purposes. Note that subsequent quarterly releases of pyrosetta (and weekly releases >2026.05 if using the PyPI pyrosetta-installer package for PyRosetta installation) automatically cache the uv.lock file. The uv.lock file may be either committed to the Git repository before running the PyRosettaCluster simulation, or stored in the system_info keyword argument for persistent storage in the output decoys (and scorefile(s) if simulation_records_in_scorefile=True). Although very uncommon, if a uv project contains packages built from sdists, it may be necessary to reproduce the environment using uv sync from the uv.lock file rather than uv pip sync from the automatically exported and cached requirements.txt file. For example:
import sys
from pathlib import Path
PyRosettaCluster(
system_info={
"sys.platform": sys.platform,
"uv.lock": Path("uv.lock").read_text(), # Optional, for accounting purposes
},
)
The virtual environment configuration used for the original simulation is cached in the full simulation record of each output decoy file and optionally each entry of each output scorefiles. Please refer to the following table to select one environment file extraction method based on the file type being used to recreate the original virtual environment:
| File type extension | Output file type | Extraction method #1 (without PyRosetta) |
Extraction method #2 (requires PyRosetta) |
|---|---|---|---|
.pdb |
Decoy | Read file → Copy → Paste into new file | Run dump_env_file.py helper |
.pdb.bz2 |
Decoy | Unzip with bzip2 → Read file → Copy → Paste into new file |
Run dump_env_file.py helper |
.pkl_pose, .pkl_pose.bz2, .b64_pose, .b64_pose.bz2 |
Decoy | Run dump_env_file.py helper (requires an identical PyRosetta build signature to that used to save the original file) |
|
.json |
Scorefile with full simulation records | Read file → Copy → Paste into new file | Run dump_env_file.py helper |
Pickled pandas.DataFrame( .gz, .xz, .tar, etc.) |
Scorefile with full simulation records | Run dump_env_file.py helper |
|
.init, .init.bz2 |
PyRosetta initialization file | Run dump_env_file.py helper (requires an identical PyRosetta build signature to that used to save the original file) |
Note
Extraction method #1: If copy/pasting into a new file, the environment file string is located in the value of the record["instance"]["environment"] nested key of the full simulation record. Please paste it into one of the following file names (as expected in the next step) in a new folder, depending on the environment manager you're using to recreate the environment:
| Environment manager | New file name |
|---|---|
| Pixi | pixi.lock |
| uv | uv.lock |
| Conda | environment.yml |
| Mamba | environment.yml |
If using Pixi/uv environment managers, please also extract the manifest file string (Pixi) or project file string (uv) located in the value of the record["metadata"]["toml"] nested key of the full simulation record. The value of the record["metadata"]["toml_format"] nested key also specifies the TOML file format. Please paste it into one of the following file names (as expected in the next step) in the same new folder, depending on the environment manager you're using to recreate the environment:
| Environment manager | New file name |
|---|---|
| Pixi | pixi.toml / pypyroject.toml |
| uv | pyproject.toml |
Also copy the value of the record["instance"]["sha1"] nested key holding the Git commit SHA-1 hash required to reproduce the PyRosettaCluster simulation!
Note
Extraction method #2: If running dump_env_file.py, the pyrosetta package (with version >=2025.47) and the PyPI pyrosetta-distributed package (for the pyrosetta.distributed framework dependencies) must be installed in any existing virtual environment, and that virtual environment's Python interpreter used to run the script.
Note: If extracting from a .pkl_pose, .pkl_pose.bz2, .b64_pose, .b64_pose.bz2, .init or .init.bz2 file, the PyRosetta build signature must be identical to that used to write the original output decoy file or PyRosetta initialization file, otherwise an exception or segmentation fault may occur (due to the requirement of matching compatibility layers to unpickle the pickled Pose objects). Furthermore, any associated chemical and topology dependencies must be loaded in order to properly reconstruct the Pose object in memory.
Also copy the printed Git commit SHA-1 required to reproduce the PyRosettaCluster simulation!
Tip
Extraction method #2: See python dump_env_file.py --help for details.
Run recreate_env.py to recreate the virtual environment.
Caution
This script runs a subprocess with one of the following commands:
conda env create ...: when using the Conda environment managermamba env create ...: when using the Mamba environment manageruv sync ...: when using the uv environment managerpixi install ...: when using the Pixi environment manager
Installing certain packages may not be secure, so please only run with an input environment file you trust!
Learn more about PyPI security and Conda security.
Important
If using Pixi/uv environment managers, please use the system Python interpreter, since the script creates a new Pixi/uv environment and cannot be run from an existing virtual environment. If using Conda/Mamba, any Python interpreter may be used.
Note
If using the uv environment manager, the PyRosetta installation step may be subsequently required if using the PyPI pyrosetta-installer package installation method (which is not recommended; see above). Note that installing the identical PyRosetta version of the original uv environment in the recreated uv environment depends fortuitously on a prompt uv environment recreation after the original uv environment creation (typically within the same week). See the PyPI pyrosetta-installer for details.
Tip
See python recreate_env.py --help for details.
In order to execute the same user-defined PyRosetta protocols, clone the original Git repository and checkout the original Git commit SHA-1 used by the original PyRosettaCluster simulation. You will need to know the owner and repository name (and if not, don't worry, there are ways to search GitHub by commit SHA-1):
git clone --no-checkout https://github.com/<owner>/<repo>.git
cd <repo>
git fetch origin <SHA-1>
git checkout <SHA-1>
Then, use the Python interpreter of the recreated environment to run your PyRosettaCluster simulation reproduction script. Here's a template script to get started (and ee the reproduce function docstring for more information)!
from pyrosetta.distributed.cluster import reproduce
# Import (or copy/paste) the original user-provided PyRosetta protocols
from my_protocols import protocol_1, protocol_2 # Change depending on the original GitHub repository structure
def main():
reproduce(
# Input either a PyRosettaCluster output decoy file or output PyRosetta initialization file
input_file=...,
# Or input a PyRosettaCluster scorefile and decoy name
scorefile=...,
decoy_name=...,
# Optional configurations:
protocols=[protocol_1, protocol_2], # Can be `None` for auto-detection
clients=...,
input_packed_pose=...,
instance_kwargs={
"output_path": ...,
"scratch_dir": ...,
"project_name": ...,
"simulation_name": ...,
},
clients_indices=...,
resources=...,
skip_corrections=...,
init_from_file_kwargs=...,
)
if __name__ == "__main__":
main()
✅ Save your PyRosettaCluster simulation reproduction script, and run it with the recreated environment's Python interpreter (with the local repository HEAD at that same Git commit SHA-1 for proper PyRosettaCluster SHA-1 validation). The PyRosetta build string and the environment file string will also be validated against the original full simulation record at this step.
🎉 Congrats! You have now recreated a virtual environment and used it to successfully reproduce an output decoy from a distributed PyRosettaCluster simulation.
