Re-run Sage 2.1.0 #69

mattwthompson · 2025-10-21T04:30:16Z

$ cp submissions/2024-11-07-Sage-2.1.0/yds.yaml submissions/2025-10-20-Sage-2.1.0/input.yaml

Submission Checklist

Created a new directory in the submissions directory containing the YDS input file and optionally a force field .offxml file
Triggered the benchmark workflow with a PR comment of the form /run-optimization-benchmarks path/to/submission/input.yaml or /run-torsion-benchmarks path/to/submission/input.yaml
Waited for the workflow to finish and a comment with Job status: success to be posted
Reviewed the results committed by the workflow
Published the corresponding Zenodo entry and retrieved the DOI
Added the Zenodo DOI to the table in the main README
Ready to merge!

mattwthompson · 2025-10-21T04:30:40Z

/run-optimization-benchmarks submissions/2025-10-20-Sage-2.1.0/input.yaml

github-actions · 2025-10-21T04:34:30Z

A workflow has been dispatched to run the benchmarks for this PR.

Run ID: 18672934811
Triggering actor: github-actions[bot]
Target branch: rerun-sage-2.1.0

github-actions · 2025-10-21T09:07:20Z

A workflow dispatched to run optimization benchmarks for this PR has just finished.

Run ID: [18672934811]
Triggering actor: github-actions[bot]
Target branch: rerun-sage-2.1.0
Job status: success
[18672934811]: https://github.com/openforcefield/yammbs-dataset-submission/actions/runs/18672934811

mattwthompson · 2025-10-21T15:10:00Z

I'm seeing some differences between this run and submissions/2024-11-07-Sage-2.1.0. They nearly don't show up visually, but do come through in the statistics.

Summary statistics for rmsd differences:
       2024-11-07-Sage-2.1.0  2025-10-20-Sage-2.1.0      abs_diff
count           64474.000000           64474.000000  64474.000000
mean                0.281187               0.281247     -0.000060
std                 0.274567               0.275336      0.086656
min                 0.000000               0.000000     -2.855092
25%                 0.118493               0.118378     -0.005638
50%                 0.191289               0.191362      0.000000
75%                 0.337260               0.336936      0.005735
max                 3.406551               4.095735      2.903419
Out of 64474 entries:
23740 entries have (absolute) difference greater than 0.01
5438 entries have (absolute) difference greater than 0.05
2331 entries have (absolute) difference greater than 0.1
364 entries have (absolute) difference greater than 0.5
101 entries have (absolute) difference greater than 1.0
Summary statistics for dde differences:
       2024-11-07-Sage-2.1.0  2025-10-20-Sage-2.1.0      abs_diff
count           54653.000000           54653.000000  5.465300e+04
mean               -0.745372              -0.745930  5.575600e-04
std                 3.399927               3.400385  4.248813e-01
min              -102.205111            -102.190539 -1.893245e+01
25%                -2.011216              -2.007744 -1.273394e-02
50%                -0.378216              -0.376148  3.304079e-12
75%                 0.800524               0.799714  1.300617e-02
max                96.402958              96.378667  1.524444e+01
Out of 64006 entries:
5504 entries have (absolute) difference greater than 0.1
1354 entries have (absolute) difference greater than 0.5
706 entries have (absolute) difference greater than 1.0
82 entries have (absolute) difference greater than 5.0

Both RMSD results on the same plot:

Distribution of RMSD differences:

Both DDE results on the same plot:

Distribution of DDE differences:

Here's the code I used to generate these plots and statistics, which I ran from this branch

lilyminium · 2025-10-22T07:18:57Z

From a quick look the differences here look probably fine, and I think (caveated with the note in the next sentence) we should merge it so Chapin has a more up-to-date comparison for the protein FF benchmarks. The only note I have is that you'd ideally want to be using the unconstrained version.

I'll leave some of my working below since this was a quick-and-dirty skim. A more rigorous check would actually compare the geometries between the two runs, instead of (here) comparing the difference from QM.

I visually checked the molecules with the highest RMSDs differences, which are long and floppy. While we expect that flexible molecules can sometimes slide into a different minimum with minor differences in optimization steps, affecting the torsions, I'd expect bonds and angles to remain relatively inflexible. Bond ICRMSD differences range up to 0.005 at the worst. The majority are very low in magnitude. The outlier points around 0.2 remain outlier points.

If I had more than 5 min I'd be curious which bond/s are contributing to the molecules with the highest differences in bond ICRMSD between runs (highest bond RMSD shown below) and look at geometries.

Same goes for angles, differences range up to 1 degree difference.

Again if I had more time I'd wonder what's going on with this seemingly uncomplicated molecule.

About 1.2k conformers had the exact same RMSD in both runs, leading me to think they probably minimized to the same structure. Checking the ddEs for these conformers (~750 of which didn't have nan energies) showed some differences ranging -0.4 to 0.4 (kcal/mol?). It's likely this comes from the conformer minimum being different in geometry, although hard to guarantee without checking. 452 conformers had ddE differences < 1e-6 though, and 601 < 1e-3, which seems reasonable.

mattwthompson · 2025-10-22T14:03:33Z

That's a lot for 5 minutes! I've done much less in much more time this morning.

Here's a subset of environment differences, none of which should be a smoking gun:

{'openeye-toolkits': ('2024.1.3', '2025.1.1'),
 'openff-amber-ff-ports': ('0.0.4', '2025.09.0'),
 'openff-forcefields': ('2024.09.0', '2025.10.0'),
 'openff-interchange': ('0.4.0', '0.4.8'),
 'openff-interchange-base': ('0.4.0', '0.4.8'),
 'openff-qcsubmit': ('0.53.0', '0.57.0'),
 'openff-toolkit': ('0.16.5', '0.17.1'),
 'openff-toolkit-base': ('0.16.5', '0.17.1'),
 'openff-units': ('0.2.2', '0.3.1'),
 'openff-utilities': ('0.1.12', '0.1.16'),
 'openmm': ('8.1.2', '8.3.1'),
 'rdkit': ('2024.03.5', '2025.03.6')}

The automation does a conda env export which is good for tracking what was run but doesn't make it easy to use that environment, especially on a different platform. It might be easier to use a tool like conda lock instead

I've pulled out a few molecules which get high ICRMSD differences but it's hard to draw conclusions quickly. I will pick this up later.

Otherwise I've

Update the docs to remind users (including myself) to use unconstrained force fields
Added the conda environment to the committed summary results; it being more accessible would be handy here
Spun out Add descriptive title to Zenodo upload #70 because it's another barrier to analysis
Made Use conda-lock when "exporting" environments #71 for reasons described above

mattwthompson · 2025-10-22T21:55:49Z

For now I will follow your recommendation to keep this moving along. There's lots we can do to make these analyses easier; the barrier here was higher than I hoped it to be

Re-run Sage 2.1.0

345da0c

Add benchmark results

d75fa6f

Add DOI to README

7ca7b6f

mattwthompson marked this pull request as ready for review October 21, 2025 15:12

mattwthompson merged commit 33ffe7c into main Oct 22, 2025

mattwthompson mentioned this pull request Oct 22, 2025

Re-run Sage 2.1.0 data #68

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Re-run Sage 2.1.0 #69

Re-run Sage 2.1.0 #69

Uh oh!

mattwthompson commented Oct 21, 2025 •

edited

Loading

Uh oh!

mattwthompson commented Oct 21, 2025

Uh oh!

github-actions bot commented Oct 21, 2025

Uh oh!

github-actions bot commented Oct 21, 2025

Uh oh!

mattwthompson commented Oct 21, 2025

Uh oh!

lilyminium commented Oct 22, 2025

Uh oh!

mattwthompson commented Oct 22, 2025

Uh oh!

mattwthompson commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Re-run Sage 2.1.0 #69

Re-run Sage 2.1.0 #69

Uh oh!

Conversation

mattwthompson commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Submission Checklist

Uh oh!

mattwthompson commented Oct 21, 2025

Uh oh!

github-actions bot commented Oct 21, 2025

Uh oh!

github-actions bot commented Oct 21, 2025

Uh oh!

mattwthompson commented Oct 21, 2025

Uh oh!

lilyminium commented Oct 22, 2025

Uh oh!

mattwthompson commented Oct 22, 2025

Uh oh!

mattwthompson commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mattwthompson commented Oct 21, 2025 •

edited

Loading