BioinfoMachineLearning
diff --git a/‎.gitignore‎
Lines changed: 1 addition & 0 deletions b/‎.gitignore‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎CHANGELOG.md‎
Lines changed: 9 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎Dockerfile‎
Lines changed: 54 additions & 0 deletions b/‎Dockerfile‎
Lines changed: 54 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 18 additions & 16 deletions b/‎README.md‎
Lines changed: 18 additions & 16 deletions
diff --git a/‎configs/scripts/build_inference_script.yaml‎
Lines changed: 11 additions & 15 deletions b/‎configs/scripts/build_inference_script.yaml‎
Lines changed: 11 additions & 15 deletions
diff --git a/‎configs/scripts/build_interaction_analysis_script.yaml‎
Lines changed: 21 additions & 0 deletions b/‎configs/scripts/build_interaction_analysis_script.yaml‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎docs/source/_static/PoseBench.png‎
-1.14 MB b/‎docs/source/_static/PoseBench.png‎
-1.14 MB
diff --git a/‎docs/source/acknowledgements.rst‎
Lines changed: 2 additions & 2 deletions b/‎docs/source/acknowledgements.rst‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/source/available_methods.rst‎
Lines changed: 2 additions & 2 deletions b/‎docs/source/available_methods.rst‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/source/bonus.rst‎
Lines changed: 2 additions & 2 deletions b/‎docs/source/bonus.rst‎
Lines changed: 2 additions & 2 deletions
@@ -202,4 +202,5 @@ configs/local/default.yaml
 /forks/TULIP/outputs/
 /forks/Vina/ADFR/
 scripts/*inference*/
+scripts/*interactions*/
 scoring/
@@ -1,3 +1,12 @@
+### 1.1.0 - 03/20/2025
+
+**Changes**:
+
+- Fixed ligand scoring bug affecting the Astex Diverse, DockGen-E, and PoseBusters Benchmark datasets' primary-ligand results. Thanks a ton, @95028!
+- Regenerated results after addressing the scoring bug. The `notebooks` directory has been updated in the process.
+- Updated Zenodo links (for `forks` and `notebooks`).
+- An updated arXiv manuscript (v7) should be online soon.
+
 ### 1.0.0 - 11/04/2025
 
 **Changes**:
 
@@ -0,0 +1,54 @@
+ARG PYTORCH_TAG=2.3.0-cuda11.8-cudnn8-devel
+FROM pytorch/pytorch:${PYTORCH_TAG}
+
+# Add system dependencies
+RUN apt-get update \
+    # Update image
+    && DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y \
+        software-properties-common \
+        curl \
+        gnupg \
+    && echo "deb http://ppa.launchpad.net/ubuntu-toolchain-r/test/ubuntu jammy main" > /etc/apt/sources.list.d/ubuntu-toolchain-r-test.list \
+    && apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 1E9377A2BA9EF27F \
+    && apt-get update \
+    # Install essential dependencies
+    && apt-get install --no-install-recommends -y \
+        build-essential \
+        git \
+        wget \
+        libxrender1 \
+        libxtst6 \
+        libxext6 \
+        libxi6 \
+        kalign \
+        gcc-11 \
+        g++-11 \
+    # Install Git LFS
+    && curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | bash \
+    && apt-get install --no-install-recommends -y git-lfs \
+    && git lfs install \
+    # Configure gcc/g++ versions
+    && update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-11 100 \
+    && update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-11 100 \
+    # Clean up dependencies
+    && rm -rf /var/lib/apt/lists/* \
+    && apt-get autoremove -y \
+    && apt-get clean
+
+# Install Conda dependencies
+RUN conda install -y -c conda-forge python=3.10 gcc=11.4.0 gxx=11.4.0 libstdcxx=14.1.0 libstdcxx-ng=14.1.0 libgcc=14.1.0 libgcc-ng=14.1.0 compilers=1.5.2 openbabel=3.1.1 && \
+    conda clean -afy
+
+# Set work directory
+WORKDIR /app/posebench
+
+# Clone and install the package + requirements
+ARG GIT_TAG=main
+RUN git clone https://github.com/BioinfoMachineLearning/posebench . --branch ${GIT_TAG} \
+    && conda env update -f environments/posebench_environment.yaml \
+    && conda install -c bioconda reduce \
+    && pip install -e . \
+    && pip install numpy==1.26.4 --no-dependencies \
+    && pip install prody==2.4.1 --no-dependencies \
+    && pip install git+https://github.com/amorehead/posecheck.git@posebench \
+    && conda clean -afy
@@ -3,7 +3,7 @@
 # PoseBench
 
 [![Paper](http://img.shields.io/badge/arXiv-2405.14108-B31B1B.svg)](https://rdcu.be/eW5oj)
-[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.17536252.svg)](https://doi.org/10.5281/zenodo.17536252)
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.19138652.svg)](https://doi.org/10.5281/zenodo.19138652)
 [![PyPI version](https://badge.fury.io/py/posebench.svg)](https://badge.fury.io/py/posebench)
 [![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)
 [![Docs](https://assets.readthedocs.org/static/projects/badges/passing-flat.svg)](https://bioinfomachinelearning.github.io/PoseBench/)
@@ -22,6 +22,8 @@ Comprehensive benchmarking of protein-ligand structure prediction methods
 
 [Documentation](https://bioinfomachinelearning.github.io/PoseBench/)
 
+> **⚠️ Notice:** We have discovered a bug in version `1.0.0` affecting the ligand scoring results of each primary-ligand dataset (Astex Diverse, DockGen-E, and PoseBusters Benchmark). We have addressed this bug in version `1.1.0`, which reports (~15% on average) reduced performance for each method currently in the benchmark. Please rerun your analyses if you have developed code on top of `PoseBench`. Thank you for your understanding!
+
 ## Contents
 
 - [Installation](#installation)
@@ -209,10 +211,10 @@ of how to extend `PoseBench`, as outlined below.
 
 ```bash
 # fetch, extract, and clean-up preprocessed Astex Diverse, PoseBusters Benchmark, DockGen, and CASP15 data (~3 GB) #
-wget https://zenodo.org/records/17536252/files/astex_diverse_set.tar.gz
-wget https://zenodo.org/records/17536252/files/posebusters_benchmark_set.tar.gz
-wget https://zenodo.org/records/17536252/files/dockgen_set.tar.gz
-wget https://zenodo.org/records/17536252/files/casp15_set.tar.gz
+wget https://zenodo.org/records/19138652/files/astex_diverse_set.tar.gz
+wget https://zenodo.org/records/19138652/files/posebusters_benchmark_set.tar.gz
+wget https://zenodo.org/records/19138652/files/dockgen_set.tar.gz
+wget https://zenodo.org/records/19138652/files/casp15_set.tar.gz
 tar -xzf astex_diverse_set.tar.gz
 tar -xzf posebusters_benchmark_set.tar.gz
 tar -xzf dockgen_set.tar.gz
@@ -228,39 +230,39 @@ rm casp15_set.tar.gz
 ```bash
 # fetch, extract, and clean-up benchmark method predictions to reproduce paper results (~19 GB) #
 # AutoDock Vina predictions and results
-wget https://zenodo.org/records/17536252/files/vina_benchmark_method_predictions.tar.gz
+wget https://zenodo.org/records/19138652/files/vina_benchmark_method_predictions.tar.gz
 tar -xzf vina_benchmark_method_predictions.tar.gz
 rm vina_benchmark_method_predictions.tar.gz
 # DiffDock predictions and results
-wget https://zenodo.org/records/17536252/files/diffdock_benchmark_method_predictions.tar.gz
+wget https://zenodo.org/records/19138652/files/diffdock_benchmark_method_predictions.tar.gz
 tar -xzf diffdock_benchmark_method_predictions.tar.gz
 rm diffdock_benchmark_method_predictions.tar.gz
 # DynamicBind predictions and results
-wget https://zenodo.org/records/17536252/files/dynamicbind_benchmark_method_predictions.tar.gz
+wget https://zenodo.org/records/19138652/files/dynamicbind_benchmark_method_predictions.tar.gz
 tar -xzf dynamicbind_benchmark_method_predictions.tar.gz
 rm dynamicbind_benchmark_method_predictions.tar.gz
 # NeuralPLexer predictions and results
-wget https://zenodo.org/records/17536252/files/neuralplexer_benchmark_method_predictions.tar.gz
+wget https://zenodo.org/records/19138652/files/neuralplexer_benchmark_method_predictions.tar.gz
 tar -xzf neuralplexer_benchmark_method_predictions.tar.gz
 rm neuralplexer_benchmark_method_predictions.tar.gz
 # RoseTTAFold-All-Atom predictions and results
-wget https://zenodo.org/records/17536252/files/rfaa_benchmark_method_predictions.tar.gz
+wget https://zenodo.org/records/19138652/files/rfaa_benchmark_method_predictions.tar.gz
 tar -xzf rfaa_benchmark_method_predictions.tar.gz
 rm rfaa_benchmark_method_predictions.tar.gz
 # Chai-1 predictions and results
-wget https://zenodo.org/records/17536252/files/chai_benchmark_method_predictions.tar.gz
+wget https://zenodo.org/records/19138652/files/chai_benchmark_method_predictions.tar.gz
 tar -xzf chai_benchmark_method_predictions.tar.gz
 rm chai_benchmark_method_predictions.tar.gz
 # Boltz-1 predictions and results
-wget https://zenodo.org/records/17536252/files/boltz_benchmark_method_predictions.tar.gz
+wget https://zenodo.org/records/19138652/files/boltz_benchmark_method_predictions.tar.gz
 tar -xzf boltz_benchmark_method_predictions.tar.gz
 rm boltz_benchmark_method_predictions.tar.gz
 # AlphaFold 3 predictions and results
-wget https://zenodo.org/records/17536252/files/af3_benchmark_method_predictions.tar.gz
+wget https://zenodo.org/records/19138652/files/af3_benchmark_method_predictions.tar.gz
 tar -xzf af3_benchmark_method_predictions.tar.gz
 rm af3_benchmark_method_predictions.tar.gz
 # CASP15 predictions and results for all methods
-wget https://zenodo.org/records/17536252/files/casp15_benchmark_method_predictions.tar.gz
+wget https://zenodo.org/records/19138652/files/casp15_benchmark_method_predictions.tar.gz
 tar -xzf casp15_benchmark_method_predictions.tar.gz
 rm casp15_benchmark_method_predictions.tar.gz
 ```
@@ -270,7 +272,7 @@ rm casp15_benchmark_method_predictions.tar.gz
 ```bash
 # fetch, extract, and clean-up benchmark method interactions to reproduce paper results (~12 GB) #
 # cached ProLIF interactions for notebook plots
-wget https://zenodo.org/records/17536252/files/posebench_notebooks.tar.gz
+wget https://zenodo.org/records/19138652/files/posebench_notebooks.tar.gz
 tar -xzf posebench_notebooks.tar.gz
 rm posebench_notebooks.tar.gz
 ```
@@ -357,7 +359,7 @@ python3 posebench/data/components/protein_apo_to_holo_alignment.py dataset=casp1
 conda deactivate
 ```
 
-**NOTE:** The preprocessed Astex Diverse, PoseBusters Benchmark, DockGen, and CASP15 data available via [Zenodo](https://doi.org/10.5281/zenodo.17536252) provide pre-holo-aligned protein structures predicted by AlphaFold 3 (and alternatively MIT-licensed ESMFold) for these respective datasets. Accordingly, users must ensure their usage of such predicted protein structures from AlphaFold 3 aligns with AlphaFold 3's [Terms of Use](https://github.com/google-deepmind/alphafold3/blob/main/WEIGHTS_TERMS_OF_USE.md).
+**NOTE:** The preprocessed Astex Diverse, PoseBusters Benchmark, DockGen, and CASP15 data available via [Zenodo](https://doi.org/10.5281/zenodo.19138652) provide pre-holo-aligned protein structures predicted by AlphaFold 3 (and alternatively MIT-licensed ESMFold) for these respective datasets. Accordingly, users must ensure their usage of such predicted protein structures from AlphaFold 3 aligns with AlphaFold 3's [Terms of Use](https://github.com/google-deepmind/alphafold3/blob/main/WEIGHTS_TERMS_OF_USE.md).
 
 </details>
 
 
@@ -1,37 +1,33 @@
 # run arguments:
-method: diffdock # the method for which to score predictions - NOTE: must be one of (`diffdock`, `fabind`, `dynamicbind`, `neuralplexer`, `flowdock`, `rfaa`, `chai-lab`, `boltz`, `vina`, `ensemble`)
-vina_binding_site_method: p2rank # the method to use for Vina binding site prediction - NOTE: must be one of (`diffdock`, `fabind`, `dynamicbind`, `neuralplexer`, `flowdock`, `rfaa`, `chai-lab`, `boltz`, `p2rank`)
+method: diffdock # the method for which to score predictions - NOTE: must be one of (`diffdock`, `fabind`, `dynamicbind`, `neuralplexer`, `flowdock`, `rfaa`, `chai-lab`, `boltz`, `alphafold3`, `vina`, `ensemble`)
+vina_binding_site_method: p2rank # the method to use for Vina binding site prediction - NOTE: must be one of (`diffdock`, `fabind`, `dynamicbind`, `neuralplexer`, `flowdock`, `rfaa`, `chai-lab`, `boltz`, `alphafold3`, `p2rank`)
 ensemble_ranking_method: consensus # the method to use for ensemble ranking - NOTE: must be one of (`consensus`, `ff`)
 dataset: astex_diverse # the dataset to use - NOTE: must be one of (`posebusters_benchmark`, `astex_diverse`, `dockgen`, `casp15`)
 repeat_index: 1 # the repeat index which was used for inference
 cuda_device_index: 0 # the CUDA device index to use for inference (for all methods except AutoDock-Vina)
 output_script_dir: ${oc.env:PROJECT_ROOT}/scripts/inference # the directory in which to save the output script
-pocket_only_baseline: null # whether to perform a pocket-only baseline for the PoseBusters Benchmark set - NOTE: not applicable only to `tulip`
+pocket_only_baseline: false # whether to perform a pocket-only baseline for the PoseBusters Benchmark set - NOTE: not applicable only to `tulip`
 v1_baseline: false # whether to perform the V1 baseline for DiffDock
-no_ilcl: null # whether to use model weights trained with an inter-ligand clash loss (ILCL) for the CASP15 set - NOTE: only applicable to `neuralplexer`
-relax_protein: null # whether to relax the protein structure before scoring - NOTE: currently in an experimental state
+no_ilcl: false # whether to use model weights trained with an inter-ligand clash loss (ILCL) for the CASP15 set - NOTE: only applicable to `neuralplexer`
+relax_protein: false # whether to relax the protein structure before scoring - NOTE: currently in an experimental state
 export_hpc_headers: true # whether to insert high-performance computing (by default, SLURM) headers into the output script
 verbose: false # whether to print verbose (e.g., invalid configuration) output
 # sweep arguments:
 sweep: false # whether to build all combinations of method-dataset run scripts
 methods_to_sweep: [
     "diffdock",
-    "fabind",
     "dynamicbind",
     "neuralplexer",
-    "flowdock",
     "rfaa",
+    # "chai-lab_ss",
     "chai-lab",
+    # "boltz_ss",
     "boltz",
+    # "alphafold3_ss",
+    "alphafold3",
     "vina",
-    "ensemble",
   ] # the methods to sweep
-vina_binding_site_methods_to_sweep: ["diffdock", "p2rank"] # the Vina binding site prediction methods to sweep
+vina_binding_site_methods_to_sweep: ["p2rank"] # the Vina binding site prediction methods to sweep
 ensemble_ranking_methods_to_sweep: ["consensus"] # the ensemble ranking methods to sweep - NOTE: must be one of (`consensus`, `ff`)
-datasets_to_sweep: [
-    "posebusters_benchmark",
-    "astex_diverse",
-    "dockgen",
-    "casp15",
-  ] # the datasets to sweep
+datasets_to_sweep: ["posebusters_benchmark", "astex_diverse", "dockgen", "casp15"] # the datasets to sweep
 num_sweep_repeats: 3 # the number of repeats to run for each method-dataset sweep (if the method is a generative method)
@@ -0,0 +1,21 @@
+# run arguments:
+method: diffdock # the method for which to preprocess interactions as H5 files
+dataset: astex_diverse # the dataset to use - NOTE: must be one of (`astex_diverse`, `casp15`, `dockgen`, `posebusters_benchmark`)
+repeat_index: 1 # the repeat index to preprocess - NOTE: currently only repeat_index=1 is supported
+output_script_dir: ${oc.env:PROJECT_ROOT}/scripts/interactions # the directory in which to save the output script
+# sweep arguments:
+sweep: true # whether to build all combinations of method-dataset preprocessing scripts
+methods_to_sweep: [
+    "vina_p2rank",
+    "diffdock",
+    "dynamicbind",
+    "neuralplexer",
+    "rfaa",
+    "chai-lab_ss",
+    "chai-lab",
+    "boltz_ss",
+    "boltz",
+    "alphafold3_ss",
+    "alphafold3",
+  ] # the methods to sweep
+datasets_to_sweep: ["astex_diverse", "dockgen", "posebusters_benchmark", "casp15"] # the datasets to sweep
@@ -2,5 +2,5 @@ Acknowledgements
 ================
 
 .. mdinclude:: ../../README.md
-    :start-line: 1309
-    :end-line: 1332
+    :start-line: 1311
+    :end-line: 1334
@@ -2,8 +2,8 @@ Available inference methods
 ================
 
 .. mdinclude:: ../../README.md
-    :start-line: 367
-    :end-line: 412
+    :start-line: 369
+    :end-line: 414
 
 .. note::
     Have a new method to add? Please let us know by creating a pull request. We would be happy to work with you to integrate new methodology into this benchmark!
@@ -2,8 +2,8 @@ Bonus
 ================
 
 .. mdinclude:: ../../README.md
-    :start-line: 1350
-    :end-line: 1352
+    :start-line: 1352
+    :end-line: 1354
 
 .. image:: ./_static/WorkBench.jpeg
   :alt: My brain after building PoseBench