Skip to content

Commit afa3d52

Browse files
authored
Merge pull request #22 from BioinfoMachineLearning/dev/20
Address primary-ligand scoring bug
2 parents b3a15ef + 357b40b commit afa3d52

File tree

74 files changed

+14585
-11420
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

74 files changed

+14585
-11420
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -202,4 +202,5 @@ configs/local/default.yaml
202202
/forks/TULIP/outputs/
203203
/forks/Vina/ADFR/
204204
scripts/*inference*/
205+
scripts/*interactions*/
205206
scoring/

CHANGELOG.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,12 @@
1+
### 1.1.0 - 03/20/2025
2+
3+
**Changes**:
4+
5+
- Fixed ligand scoring bug affecting the Astex Diverse, DockGen-E, and PoseBusters Benchmark datasets' primary-ligand results. Thanks a ton, @95028!
6+
- Regenerated results after addressing the scoring bug. The `notebooks` directory has been updated in the process.
7+
- Updated Zenodo links (for `forks` and `notebooks`).
8+
- An updated arXiv manuscript (v7) should be online soon.
9+
110
### 1.0.0 - 11/04/2025
211

312
**Changes**:

Dockerfile

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
ARG PYTORCH_TAG=2.3.0-cuda11.8-cudnn8-devel
2+
FROM pytorch/pytorch:${PYTORCH_TAG}
3+
4+
# Add system dependencies
5+
RUN apt-get update \
6+
# Update image
7+
&& DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y \
8+
software-properties-common \
9+
curl \
10+
gnupg \
11+
&& echo "deb http://ppa.launchpad.net/ubuntu-toolchain-r/test/ubuntu jammy main" > /etc/apt/sources.list.d/ubuntu-toolchain-r-test.list \
12+
&& apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 1E9377A2BA9EF27F \
13+
&& apt-get update \
14+
# Install essential dependencies
15+
&& apt-get install --no-install-recommends -y \
16+
build-essential \
17+
git \
18+
wget \
19+
libxrender1 \
20+
libxtst6 \
21+
libxext6 \
22+
libxi6 \
23+
kalign \
24+
gcc-11 \
25+
g++-11 \
26+
# Install Git LFS
27+
&& curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | bash \
28+
&& apt-get install --no-install-recommends -y git-lfs \
29+
&& git lfs install \
30+
# Configure gcc/g++ versions
31+
&& update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-11 100 \
32+
&& update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-11 100 \
33+
# Clean up dependencies
34+
&& rm -rf /var/lib/apt/lists/* \
35+
&& apt-get autoremove -y \
36+
&& apt-get clean
37+
38+
# Install Conda dependencies
39+
RUN conda install -y -c conda-forge python=3.10 gcc=11.4.0 gxx=11.4.0 libstdcxx=14.1.0 libstdcxx-ng=14.1.0 libgcc=14.1.0 libgcc-ng=14.1.0 compilers=1.5.2 openbabel=3.1.1 && \
40+
conda clean -afy
41+
42+
# Set work directory
43+
WORKDIR /app/posebench
44+
45+
# Clone and install the package + requirements
46+
ARG GIT_TAG=main
47+
RUN git clone https://github.com/BioinfoMachineLearning/posebench . --branch ${GIT_TAG} \
48+
&& conda env update -f environments/posebench_environment.yaml \
49+
&& conda install -c bioconda reduce \
50+
&& pip install -e . \
51+
&& pip install numpy==1.26.4 --no-dependencies \
52+
&& pip install prody==2.4.1 --no-dependencies \
53+
&& pip install git+https://github.com/amorehead/posecheck.git@posebench \
54+
&& conda clean -afy

README.md

Lines changed: 18 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
# PoseBench
44

55
[![Paper](http://img.shields.io/badge/arXiv-2405.14108-B31B1B.svg)](https://rdcu.be/eW5oj)
6-
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.17536252.svg)](https://doi.org/10.5281/zenodo.17536252)
6+
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.19138652.svg)](https://doi.org/10.5281/zenodo.19138652)
77
[![PyPI version](https://badge.fury.io/py/posebench.svg)](https://badge.fury.io/py/posebench)
88
[![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)
99
[![Docs](https://assets.readthedocs.org/static/projects/badges/passing-flat.svg)](https://bioinfomachinelearning.github.io/PoseBench/)
@@ -22,6 +22,8 @@ Comprehensive benchmarking of protein-ligand structure prediction methods
2222

2323
[Documentation](https://bioinfomachinelearning.github.io/PoseBench/)
2424

25+
> **⚠️ Notice:** We have discovered a bug in version `1.0.0` affecting the ligand scoring results of each primary-ligand dataset (Astex Diverse, DockGen-E, and PoseBusters Benchmark). We have addressed this bug in version `1.1.0`, which reports (~15% on average) reduced performance for each method currently in the benchmark. Please rerun your analyses if you have developed code on top of `PoseBench`. Thank you for your understanding!
26+
2527
## Contents
2628

2729
- [Installation](#installation)
@@ -209,10 +211,10 @@ of how to extend `PoseBench`, as outlined below.
209211

210212
```bash
211213
# fetch, extract, and clean-up preprocessed Astex Diverse, PoseBusters Benchmark, DockGen, and CASP15 data (~3 GB) #
212-
wget https://zenodo.org/records/17536252/files/astex_diverse_set.tar.gz
213-
wget https://zenodo.org/records/17536252/files/posebusters_benchmark_set.tar.gz
214-
wget https://zenodo.org/records/17536252/files/dockgen_set.tar.gz
215-
wget https://zenodo.org/records/17536252/files/casp15_set.tar.gz
214+
wget https://zenodo.org/records/19138652/files/astex_diverse_set.tar.gz
215+
wget https://zenodo.org/records/19138652/files/posebusters_benchmark_set.tar.gz
216+
wget https://zenodo.org/records/19138652/files/dockgen_set.tar.gz
217+
wget https://zenodo.org/records/19138652/files/casp15_set.tar.gz
216218
tar -xzf astex_diverse_set.tar.gz
217219
tar -xzf posebusters_benchmark_set.tar.gz
218220
tar -xzf dockgen_set.tar.gz
@@ -228,39 +230,39 @@ rm casp15_set.tar.gz
228230
```bash
229231
# fetch, extract, and clean-up benchmark method predictions to reproduce paper results (~19 GB) #
230232
# AutoDock Vina predictions and results
231-
wget https://zenodo.org/records/17536252/files/vina_benchmark_method_predictions.tar.gz
233+
wget https://zenodo.org/records/19138652/files/vina_benchmark_method_predictions.tar.gz
232234
tar -xzf vina_benchmark_method_predictions.tar.gz
233235
rm vina_benchmark_method_predictions.tar.gz
234236
# DiffDock predictions and results
235-
wget https://zenodo.org/records/17536252/files/diffdock_benchmark_method_predictions.tar.gz
237+
wget https://zenodo.org/records/19138652/files/diffdock_benchmark_method_predictions.tar.gz
236238
tar -xzf diffdock_benchmark_method_predictions.tar.gz
237239
rm diffdock_benchmark_method_predictions.tar.gz
238240
# DynamicBind predictions and results
239-
wget https://zenodo.org/records/17536252/files/dynamicbind_benchmark_method_predictions.tar.gz
241+
wget https://zenodo.org/records/19138652/files/dynamicbind_benchmark_method_predictions.tar.gz
240242
tar -xzf dynamicbind_benchmark_method_predictions.tar.gz
241243
rm dynamicbind_benchmark_method_predictions.tar.gz
242244
# NeuralPLexer predictions and results
243-
wget https://zenodo.org/records/17536252/files/neuralplexer_benchmark_method_predictions.tar.gz
245+
wget https://zenodo.org/records/19138652/files/neuralplexer_benchmark_method_predictions.tar.gz
244246
tar -xzf neuralplexer_benchmark_method_predictions.tar.gz
245247
rm neuralplexer_benchmark_method_predictions.tar.gz
246248
# RoseTTAFold-All-Atom predictions and results
247-
wget https://zenodo.org/records/17536252/files/rfaa_benchmark_method_predictions.tar.gz
249+
wget https://zenodo.org/records/19138652/files/rfaa_benchmark_method_predictions.tar.gz
248250
tar -xzf rfaa_benchmark_method_predictions.tar.gz
249251
rm rfaa_benchmark_method_predictions.tar.gz
250252
# Chai-1 predictions and results
251-
wget https://zenodo.org/records/17536252/files/chai_benchmark_method_predictions.tar.gz
253+
wget https://zenodo.org/records/19138652/files/chai_benchmark_method_predictions.tar.gz
252254
tar -xzf chai_benchmark_method_predictions.tar.gz
253255
rm chai_benchmark_method_predictions.tar.gz
254256
# Boltz-1 predictions and results
255-
wget https://zenodo.org/records/17536252/files/boltz_benchmark_method_predictions.tar.gz
257+
wget https://zenodo.org/records/19138652/files/boltz_benchmark_method_predictions.tar.gz
256258
tar -xzf boltz_benchmark_method_predictions.tar.gz
257259
rm boltz_benchmark_method_predictions.tar.gz
258260
# AlphaFold 3 predictions and results
259-
wget https://zenodo.org/records/17536252/files/af3_benchmark_method_predictions.tar.gz
261+
wget https://zenodo.org/records/19138652/files/af3_benchmark_method_predictions.tar.gz
260262
tar -xzf af3_benchmark_method_predictions.tar.gz
261263
rm af3_benchmark_method_predictions.tar.gz
262264
# CASP15 predictions and results for all methods
263-
wget https://zenodo.org/records/17536252/files/casp15_benchmark_method_predictions.tar.gz
265+
wget https://zenodo.org/records/19138652/files/casp15_benchmark_method_predictions.tar.gz
264266
tar -xzf casp15_benchmark_method_predictions.tar.gz
265267
rm casp15_benchmark_method_predictions.tar.gz
266268
```
@@ -270,7 +272,7 @@ rm casp15_benchmark_method_predictions.tar.gz
270272
```bash
271273
# fetch, extract, and clean-up benchmark method interactions to reproduce paper results (~12 GB) #
272274
# cached ProLIF interactions for notebook plots
273-
wget https://zenodo.org/records/17536252/files/posebench_notebooks.tar.gz
275+
wget https://zenodo.org/records/19138652/files/posebench_notebooks.tar.gz
274276
tar -xzf posebench_notebooks.tar.gz
275277
rm posebench_notebooks.tar.gz
276278
```
@@ -357,7 +359,7 @@ python3 posebench/data/components/protein_apo_to_holo_alignment.py dataset=casp1
357359
conda deactivate
358360
```
359361

360-
**NOTE:** The preprocessed Astex Diverse, PoseBusters Benchmark, DockGen, and CASP15 data available via [Zenodo](https://doi.org/10.5281/zenodo.17536252) provide pre-holo-aligned protein structures predicted by AlphaFold 3 (and alternatively MIT-licensed ESMFold) for these respective datasets. Accordingly, users must ensure their usage of such predicted protein structures from AlphaFold 3 aligns with AlphaFold 3's [Terms of Use](https://github.com/google-deepmind/alphafold3/blob/main/WEIGHTS_TERMS_OF_USE.md).
362+
**NOTE:** The preprocessed Astex Diverse, PoseBusters Benchmark, DockGen, and CASP15 data available via [Zenodo](https://doi.org/10.5281/zenodo.19138652) provide pre-holo-aligned protein structures predicted by AlphaFold 3 (and alternatively MIT-licensed ESMFold) for these respective datasets. Accordingly, users must ensure their usage of such predicted protein structures from AlphaFold 3 aligns with AlphaFold 3's [Terms of Use](https://github.com/google-deepmind/alphafold3/blob/main/WEIGHTS_TERMS_OF_USE.md).
361363

362364
</details>
363365

Lines changed: 11 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,37 +1,33 @@
11
# run arguments:
2-
method: diffdock # the method for which to score predictions - NOTE: must be one of (`diffdock`, `fabind`, `dynamicbind`, `neuralplexer`, `flowdock`, `rfaa`, `chai-lab`, `boltz`, `vina`, `ensemble`)
3-
vina_binding_site_method: p2rank # the method to use for Vina binding site prediction - NOTE: must be one of (`diffdock`, `fabind`, `dynamicbind`, `neuralplexer`, `flowdock`, `rfaa`, `chai-lab`, `boltz`, `p2rank`)
2+
method: diffdock # the method for which to score predictions - NOTE: must be one of (`diffdock`, `fabind`, `dynamicbind`, `neuralplexer`, `flowdock`, `rfaa`, `chai-lab`, `boltz`, `alphafold3`, `vina`, `ensemble`)
3+
vina_binding_site_method: p2rank # the method to use for Vina binding site prediction - NOTE: must be one of (`diffdock`, `fabind`, `dynamicbind`, `neuralplexer`, `flowdock`, `rfaa`, `chai-lab`, `boltz`, `alphafold3`, `p2rank`)
44
ensemble_ranking_method: consensus # the method to use for ensemble ranking - NOTE: must be one of (`consensus`, `ff`)
55
dataset: astex_diverse # the dataset to use - NOTE: must be one of (`posebusters_benchmark`, `astex_diverse`, `dockgen`, `casp15`)
66
repeat_index: 1 # the repeat index which was used for inference
77
cuda_device_index: 0 # the CUDA device index to use for inference (for all methods except AutoDock-Vina)
88
output_script_dir: ${oc.env:PROJECT_ROOT}/scripts/inference # the directory in which to save the output script
9-
pocket_only_baseline: null # whether to perform a pocket-only baseline for the PoseBusters Benchmark set - NOTE: not applicable only to `tulip`
9+
pocket_only_baseline: false # whether to perform a pocket-only baseline for the PoseBusters Benchmark set - NOTE: not applicable only to `tulip`
1010
v1_baseline: false # whether to perform the V1 baseline for DiffDock
11-
no_ilcl: null # whether to use model weights trained with an inter-ligand clash loss (ILCL) for the CASP15 set - NOTE: only applicable to `neuralplexer`
12-
relax_protein: null # whether to relax the protein structure before scoring - NOTE: currently in an experimental state
11+
no_ilcl: false # whether to use model weights trained with an inter-ligand clash loss (ILCL) for the CASP15 set - NOTE: only applicable to `neuralplexer`
12+
relax_protein: false # whether to relax the protein structure before scoring - NOTE: currently in an experimental state
1313
export_hpc_headers: true # whether to insert high-performance computing (by default, SLURM) headers into the output script
1414
verbose: false # whether to print verbose (e.g., invalid configuration) output
1515
# sweep arguments:
1616
sweep: false # whether to build all combinations of method-dataset run scripts
1717
methods_to_sweep: [
1818
"diffdock",
19-
"fabind",
2019
"dynamicbind",
2120
"neuralplexer",
22-
"flowdock",
2321
"rfaa",
22+
# "chai-lab_ss",
2423
"chai-lab",
24+
# "boltz_ss",
2525
"boltz",
26+
# "alphafold3_ss",
27+
"alphafold3",
2628
"vina",
27-
"ensemble",
2829
] # the methods to sweep
29-
vina_binding_site_methods_to_sweep: ["diffdock", "p2rank"] # the Vina binding site prediction methods to sweep
30+
vina_binding_site_methods_to_sweep: ["p2rank"] # the Vina binding site prediction methods to sweep
3031
ensemble_ranking_methods_to_sweep: ["consensus"] # the ensemble ranking methods to sweep - NOTE: must be one of (`consensus`, `ff`)
31-
datasets_to_sweep: [
32-
"posebusters_benchmark",
33-
"astex_diverse",
34-
"dockgen",
35-
"casp15",
36-
] # the datasets to sweep
32+
datasets_to_sweep: ["posebusters_benchmark", "astex_diverse", "dockgen", "casp15"] # the datasets to sweep
3733
num_sweep_repeats: 3 # the number of repeats to run for each method-dataset sweep (if the method is a generative method)
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# run arguments:
2+
method: diffdock # the method for which to preprocess interactions as H5 files
3+
dataset: astex_diverse # the dataset to use - NOTE: must be one of (`astex_diverse`, `casp15`, `dockgen`, `posebusters_benchmark`)
4+
repeat_index: 1 # the repeat index to preprocess - NOTE: currently only repeat_index=1 is supported
5+
output_script_dir: ${oc.env:PROJECT_ROOT}/scripts/interactions # the directory in which to save the output script
6+
# sweep arguments:
7+
sweep: true # whether to build all combinations of method-dataset preprocessing scripts
8+
methods_to_sweep: [
9+
"vina_p2rank",
10+
"diffdock",
11+
"dynamicbind",
12+
"neuralplexer",
13+
"rfaa",
14+
"chai-lab_ss",
15+
"chai-lab",
16+
"boltz_ss",
17+
"boltz",
18+
"alphafold3_ss",
19+
"alphafold3",
20+
] # the methods to sweep
21+
datasets_to_sweep: ["astex_diverse", "dockgen", "posebusters_benchmark", "casp15"] # the datasets to sweep

docs/source/_static/PoseBench.png

-1.14 MB
Loading

docs/source/acknowledgements.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,5 @@ Acknowledgements
22
================
33

44
.. mdinclude:: ../../README.md
5-
:start-line: 1309
6-
:end-line: 1332
5+
:start-line: 1311
6+
:end-line: 1334

docs/source/available_methods.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@ Available inference methods
22
================
33

44
.. mdinclude:: ../../README.md
5-
:start-line: 367
6-
:end-line: 412
5+
:start-line: 369
6+
:end-line: 414
77

88
.. note::
99
Have a new method to add? Please let us know by creating a pull request. We would be happy to work with you to integrate new methodology into this benchmark!

docs/source/bonus.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@ Bonus
22
================
33

44
.. mdinclude:: ../../README.md
5-
:start-line: 1350
6-
:end-line: 1352
5+
:start-line: 1352
6+
:end-line: 1354
77

88
.. image:: ./_static/WorkBench.jpeg
99
:alt: My brain after building PoseBench

0 commit comments

Comments
 (0)