Skip to content

t-0hmura/pdb2reaction

Repository files navigation

pdb2reaction: automated reaction-path modeling directly from PDB structures

Overview

pdb2reaction is a Python CLI toolkit for turning PDB structures into enzymatic reaction pathways with machine-learning interatomic potentials (MLIPs).

In many workflows, a single command like the one below can generate a useful first-pass enzymatic reaction path:

pdb2reaction -i R.pdb P.pdb -c 'SAM,GPP' --ligand-charge 'SAM:1,GPP:-3'

You can also run MEP search → TS optimization → IRC → thermochemistry → single-point DFT calculations in one command by adding --tsopt True --thermo True --dft True:

pdb2reaction -i R.pdb P.pdb -c 'SAM,GPP' --ligand-charge 'SAM:1,GPP:-3' --tsopt True --thermo True --dft True

Given (i) two or more full protein–ligand PDB files (R → … → P), or (ii) one PDB with --scan-lists, or (iii) one TS candidate with --tsopt True, pdb2reaction automatically:

  • extracts an active-site pocket around user‑defined substrates to build a cluster model,
  • explores minimum‑energy paths (MEPs) with path optimization methods such as the Growing String Method (GSM) and Direct Max Flux (DMF),
  • optionally optimizes transition states, runs vibrational analysis, IRC calculations, and single‑point DFT calculations,

using Meta's UMA machine-learning interatomic potential (MLIP).

Expectation setting for TS search

  • Treat single-command outputs as a strong initial guess, not guaranteed final TS validation.
  • Enzyme systems often require iterative refinement (pocket definition, scan targets, constraints, and endpoint quality).
  • Always validate TS candidates with frequency analysis and IRC before mechanistic interpretation.

All of this is exposed through a command-line interface (CLI) designed so that a multi-step enzymatic reaction mechanism can be generated with minimal manual intervention. The same workflow also works for small-molecule systems. If you run workflows on full structures (i.e., omit --center/-c and --ligand-charge), you can use .xyz or .gjf inputs as well.

On HPC clusters or multi‑GPU workstations, pdb2reaction can process large cluster models (and optionally full protein–ligand complexes) by parallelizing UMA inference across nodes. Set workers and workers_per_node to enable parallel inference; see docs/uma_pysis.md for configuration details.

Important (prerequisites):

  • Input PDB files must already contain hydrogen atoms.
  • When you provide multiple PDBs, they must contain the same atoms in the same order (only coordinates may differ); otherwise an error is raised.
  • Boolean CLI options are passed explicitly as True/False (e.g., --tsopt True).

Documentation

For detailed documentation, please refer to:

This software is still under development. Please use it at your own risk.


Quick Installation

pdb2reaction is intended for Linux environments with a CUDA‑capable GPU.

Minimal setup (CUDA 12.9, torch 2.8.0)

pip install torch==2.8.0 --index-url https://download.pytorch.org/whl/cu129
pip install git+https://github.com/t-0hmura/pdb2reaction.git
plotly_get_chrome -y

Log in to Hugging Face Hub to download UMA models:

huggingface-cli login

For DMF method

If you want to use Direct Max Flux (DMF) for MEP search, install cyipopt first:

conda create -n pdb2reaction python=3.11 -y
conda activate pdb2reaction
conda install -c conda-forge cyipopt -y
pip install torch==2.8.0 --index-url https://download.pytorch.org/whl/cu129
pip install git+https://github.com/t-0hmura/pdb2reaction.git
plotly_get_chrome -y

For detailed installation instructions, see Getting Started.


Quick Examples

Multi-structure MEP search

pdb2reaction -i R.pdb P.pdb -c 'SAM,GPP' --ligand-charge 'SAM:1,GPP:-3'

Full workflow with TS optimization, thermochemistry, and DFT

pdb2reaction -i R.pdb P.pdb -c 'SAM,GPP' --ligand-charge 'SAM:1,GPP:-3' \
    --tsopt True --thermo True --dft True

Single-structure scan mode

pdb2reaction -i R.pdb -c 'SAM,GPP' --ligand-charge 'SAM:1,GPP:-3' \
    --scan-lists '[("TYR,285,CA","MMT,309,C10",2.20)]'

TS optimization only

pdb2reaction -i TS_candidate.pdb -c 'SAM,GPP' --ligand-charge 'SAM:1,GPP:-3' \
    --tsopt True

CLI Subcommands

Subcommand Role Documentation
all End-to-end workflow: extraction → MEP search → TS optimization → IRC → freq → DFT docs/all.md
extract Extract active-site pocket (cluster model) from protein–ligand complex docs/extract.md
opt Single-structure geometry optimization (L-BFGS or RFO) docs/opt.md
tsopt Transition state optimization (Dimer or RS-I-RFO) docs/tsopt.md
path-opt MEP optimization via GSM or DMF docs/path_opt.md
path-search Recursive MEP search with automatic refinement docs/path_search.md
scan 1D bond-length driven scan with restraints docs/scan.md
scan2d 2D distance grid scan docs/scan2d.md
scan3d 3D distance grid scan docs/scan3d.md
irc IRC calculation with EulerPC docs/irc.md
freq Vibrational frequency analysis and thermochemistry docs/freq.md
dft Single-point DFT using GPU4PySCF (with CPU PySCF fallback) docs/dft.md
trj2fig Plot ΔE or E from an XYZ trajectory docs/trj2fig.md
energy-diagram Draw state energy diagram directly from numeric values docs/energy-diagram.md
add-elem-info Add or repair PDB element columns (77–78) docs/add_elem_info.md

Important: Subcommands (except all) assume cluster models generated by extract. In these models, the atom closest to the Link‑H cap is automatically frozen. If you construct a cluster model yourself, set the Link‑H residue name to LKH and atom name to HL, or specify atoms to freeze via --args-yamlgeom.freeze_atoms.

Tip: In all, tsopt, freq, and irc, setting --hessian-calc-mode Analytical is strongly recommended when you have enough VRAM.


Getting Help

pdb2reaction --help
pdb2reaction <subcommand> --help

For detailed workflows, argument schemas, and example YAML files, consult the documentation files in docs/. For UMA calculator options, see docs/uma_pysis.md.

If you encounter any issues, please open an issue at https://github.com/t-0hmura/pdb2reaction/issues.


Citation

A preprint describing pdb2reaction is in preparation. Please check back for citation details once it is available.


References

[1] Wood, B. M., Dzamba, M., Fu, X., Gao, M., Shuaibi, M., Barroso-Luque, L., Abdelmaqsoud, K., Gharakhanyan, V., Kitchin, J. R., Levine, D. S., Michel, K., Sriram, A., Cohen, T., Das, A., Rizvi, A., Sahoo, S. J., Ulissi, Z. W., & Zitnick, C. L. (2025). UMA: A Family of Universal Models for Atoms. http://arxiv.org/abs/2506.23971 [2] Steinmetzer, J., Kupfer, S., & Gräfe, S. (2021). pysisyphus: Exploring potential energy surfaces in ground and excited states. International Journal of Quantum Chemistry, 121(3). https://doi.org/10.1002/qua.26390


License

pdb2reaction is distributed under the GNU General Public License version 3 (GPL-3.0).

Packages

 
 
 

Contributors

Languages