An MCMC framework for protein design that integrates AlphaFold2 structural predictions with ESM2 evolutionary priors, enabling principled exploration of sequence-structure space beyond gradient-based optimization.
git clone https://github.com/flagshippioneering/relaxedsequencesampling.git
cd relaxedsequencesampling
uv syncmlflow server --host 0.0.0.0 --port 5000# Using RSS (recommended)
python train.py --config config_exp/rss.yaml
# Using default gradient descent
python train.py --config config_exp/rso.yaml
# Or specify parameters directly
python train.py \
--design_type rss \
--pdb_filename 1brs.pdb \
--chain A \
--binder_chain D \
--iters 1000 \
--beta_t 1.0 \
--esm_weight 0.2- Method: Standard gradient descent on soft sequence logits
- Optimizer: SGD or Adam
- Use Case: Fast, deterministic optimization when exploration is less critical
- Key Parameters:
eta_init: Learning rate (default: 0.01)optimizer: "sgd" or "adam"norm_seq_grad: Normalize sequence gradients
- Method: MCMC with Metropolis-Adjusted Langevin Algorithm (MALA) + masked PLM jumps
- Exploration: Better exploration of sequence space via stochastic sampling
- Use Case: When you need diverse, high-quality sequences with proper uncertainty quantification
- Three-Phase Schedule:
- Pre-relax (optional): Deterministic descent to find low-energy region
- Warm-up (optional): SGLD without MH correction for rapid mixing
- Main MALA: Full MCMC with detailed balance for sampling
beta_t(default: 1.0): Inverse temperature for target distribution- Higher = more focused on low energy, lower = more exploration
eta_init(default: 0.01): Initial step size for Langevin walkseta_t_main(default: 0.0001): Step size for main MALA phaseuse_mh(default: True): Use Metropolis-Hastings correction for detailed balancestateless(default: True): Stateless AF2 evaluation for proper MALA acceptance
esm_weight(default: 0.2): Weight for ESM2 language model loss (λ)esm_model_name(default: "esm2_t30_150M_UR50D"): ESM2 model variantesm_loss_type(default: "cross_entropy"): Loss type for ESM2 scoring
p_jump(default: 0.3): Probability of jump vs walk at each stepkappa(default: 0.3): Mask probability scaling (gradient-informed masking)tau(default: 2.0): Temperature for ESM2 token samplinggamma(default: 1.0): Swap-bias update strengthmask_budget_frac(default: 0.2): Expected fraction of sequence to mask
use_prerelax(default: True): Enable deterministic pre-relaxationprerelax_iters(default: 100): Number of pre-relax iterationsuse_warmup(default: True): Enable SGLD warm-up phasewarmup_iters(default: 300): Number of warm-up iterationswarmup_beta(default: 1.0): Beta for warm-up phasewarmup_eta(default: 0.003): Step size for warm-up
use_eta_rm(default: True): Adaptive step size via Robbins-Monroclip_grad(default: True): Clip gradients to prevent instabilitycenter_logits(default: True): Center proposals in softmax-invariant subspace
iters(default: 1000): Total MCMC iterationsnum_samples(default: 10): Number of PDB samples to savesampling_freq(default: 50): Save sample every N iterationslog_every_mlflow(default: 25): Log metrics every N iterations
Results are saved to --output_dir (default: ./results):
results/
└── {pdb_name}_{mlflow_run_id}/
├── binder_{pdb_name}_{chain}_{timestamp}.pdb
├── sequence_with_targets.json
├── sampled_sequences.json
└── sampled_pdbs/
├── binder_{pdb_name}_{chain}_{iter}_{timestamp}.pdb
└── ...
See config_exp/ for example configurations:
rss.yaml: RSS with recommended parametersrso.yaml: Default gradient descent baseline
Configure MLflow server:
python train.py \
--mlflow_tracking_host 127.0.0.1 \
--mlflow_tracking_port 5000 \
--enable_ml_flow TrueView results at http://127.0.0.1:5000
This repository extends ColabDesign with RSS capabilities. The contribution of this code by Flagship Pioneering is under a CC BY-SA 4.0 license. See License.