This repository is the fixed snapshot used to reproduce the figures and analysis from our publication in the Journal of Chemical Theory and Computation (JCTC).
Paper (JCTC):
- Guiding Peptide Conformational Kinetics via Collective-Variable Control of Free-Energy Barriers
https://pubs.acs.org/doi/10.1021/acs.jctc.6c00418
Preprint:
We approach peptide kinetic engineering using HLDA-based collective variables within the CV-FEST framework, constructed only from short simulations confined to folded and unfolded states.
This provides a data-efficient way to model and control free-energy surfaces and barrier heights, enabling prediction of mutation-dependent kinetics and guiding rational peptide design from local fluctuations alone.luctuations alone.
- Create and activate the environment:
conda env create -f environment.yml
conda activate protein-fes
pip install -e .- Unpack archived data:
./scripts/unpack_data.sh- Run notebooks for paper figures:
- Open notebooks in
src/paper_plots/ - Execute the required notebooks to regenerate plots
Data is stored as split archives in data_archives/.
data_core.zipcontains shared analysis assets, including preprocessed MFPT files:data/mfpt_threshold_summaries_ref.pkldata/mfpt_samples_pace25000_ref.pkl
hlda_trajectories_*.zipcontains per-mutant trajectory cache data underdata/hlda_trajectories/
Short description of the MFPT files used in this paper flow:
mfpt_samples_pace25000_ref.pkl: dictionary keyed by mutant, then threshold, containing per-run MFPT samples from thePACE=25000setup (typically about 200 runs per mutant/threshold; a few entries are slightly fewer due to missing/failed runs).mfpt_threshold_summaries_ref.pkl: dictionary keyed by MFPT threshold (lim), each value a per-mutant summary DataFrame used by notebooks (for examplemfpt,lambda,tF,tU,residue_idx,property_grp,Tm,dTm,nF,nU, etc.). This summary includes HLDA-derived quantities (for examplelambda,tF,tU) throughhlda_lambda_grid.pkl, which is computed fromdata/hlda_trajectories/viasrc/common/hlda_utils.py.
If you need to rebuild archives from an unpacked data/ tree:
./scripts/pack_data.shYou can reproduce MFPT-based results in two ways:
- Generate MFPT samples from FPT simulations using
src/fpt_plumed/templates (for example throughsrc/fpt_single_run.sh). - Use the preprocessed MFPT files from
data_core.zip(recommended for paper reproduction):data/mfpt_threshold_summaries_ref.pkldata/mfpt_samples_pace25000_ref.pkl
HLDA grid generation is implemented in src/common/hlda_utils.py.
compute_lambda_grid(...)loads folded/unfolded COLVAR data for each mutant fromdata/hlda_trajectories/, sweeps(tF, tU)RMSD thresholds, prunes highly correlated descriptors (Spearman), and computes HLDA weights/eigenvalue (lambda) per grid point.load_lambda_grid(...)is the notebook-facing entrypoint: it loads cached results fromdata/hlda_lambda_grid.pklif present, otherwise computes and caches them.
Minimal usage pattern (same flow used by paper notebooks):
from pathlib import Path
from common.hlda_utils import load_lambda_grid
data_dir = Path("data")
lambda_grid = load_lambda_grid(
cache_path=data_dir / "hlda_lambda_grid.pkl",
base_dir=data_dir / "hlda_trajectories",
force=False,
)Set force=True to recompute the HLDA grid from raw trajectory-derived data.
src/paper_plots/: notebooks that generate paper plotssrc/fpt_plumed/: PLUMED templates for FPT workflowsscripts/unpack_data.sh: restoredata/from*.ziparchivesscripts/pack_data.sh: rebuild split data archivesdata_archives/: committed paper snapshot data archives
See CITATION.cff for software and paper citation metadata.
- Code: MIT (
LICENSE) - Paper/manuscript materials: CC BY-NC-ND 4.0 (
LICENSE-paper)