Skip to content

Conversation

@mattwthompson
Copy link
Member

@mattwthompson mattwthompson commented Oct 21, 2025

$ cp submissions/2024-11-07-Sage-2.1.0/yds.yaml submissions/2025-10-20-Sage-2.1.0/input.yaml

Submission Checklist

  • Created a new directory in the submissions directory containing the YDS input file and optionally a force field .offxml file
  • Triggered the benchmark workflow with a PR comment of the form /run-optimization-benchmarks path/to/submission/input.yaml or /run-torsion-benchmarks path/to/submission/input.yaml
  • Waited for the workflow to finish and a comment with Job status: success to be posted
  • Reviewed the results committed by the workflow
  • Published the corresponding Zenodo entry and retrieved the DOI
  • Added the Zenodo DOI to the table in the main README
  • Ready to merge!

@mattwthompson
Copy link
Member Author

/run-optimization-benchmarks submissions/2025-10-20-Sage-2.1.0/input.yaml

@github-actions
Copy link

A workflow has been dispatched to run the benchmarks for this PR.

  • Run ID: 18672934811
  • Triggering actor: github-actions[bot]
  • Target branch: rerun-sage-2.1.0

@github-actions
Copy link

A workflow dispatched to run optimization benchmarks for this PR has just finished.

@mattwthompson
Copy link
Member Author

I'm seeing some differences between this run and submissions/2024-11-07-Sage-2.1.0. They nearly don't show up visually, but do come through in the statistics.

Summary statistics for rmsd differences:
       2024-11-07-Sage-2.1.0  2025-10-20-Sage-2.1.0      abs_diff
count           64474.000000           64474.000000  64474.000000
mean                0.281187               0.281247     -0.000060
std                 0.274567               0.275336      0.086656
min                 0.000000               0.000000     -2.855092
25%                 0.118493               0.118378     -0.005638
50%                 0.191289               0.191362      0.000000
75%                 0.337260               0.336936      0.005735
max                 3.406551               4.095735      2.903419
Out of 64474 entries:
23740 entries have (absolute) difference greater than 0.01
5438 entries have (absolute) difference greater than 0.05
2331 entries have (absolute) difference greater than 0.1
364 entries have (absolute) difference greater than 0.5
101 entries have (absolute) difference greater than 1.0
Summary statistics for dde differences:
       2024-11-07-Sage-2.1.0  2025-10-20-Sage-2.1.0      abs_diff
count           54653.000000           54653.000000  5.465300e+04
mean               -0.745372              -0.745930  5.575600e-04
std                 3.399927               3.400385  4.248813e-01
min              -102.205111            -102.190539 -1.893245e+01
25%                -2.011216              -2.007744 -1.273394e-02
50%                -0.378216              -0.376148  3.304079e-12
75%                 0.800524               0.799714  1.300617e-02
max                96.402958              96.378667  1.524444e+01
Out of 64006 entries:
5504 entries have (absolute) difference greater than 0.1
1354 entries have (absolute) difference greater than 0.5
706 entries have (absolute) difference greater than 1.0
82 entries have (absolute) difference greater than 5.0

Both RMSD results on the same plot:
image

Distribution of RMSD differences:
image

Both DDE results on the same plot:
image

Distribution of DDE differences:
image

Here's the code I used to generate these plots and statistics, which I ran from this branch

@mattwthompson mattwthompson marked this pull request as ready for review October 21, 2025 15:12
@lilyminium
Copy link
Collaborator

From a quick look the differences here look probably fine, and I think (caveated with the note in the next sentence) we should merge it so Chapin has a more up-to-date comparison for the protein FF benchmarks. The only note I have is that you'd ideally want to be using the unconstrained version.

I'll leave some of my working below since this was a quick-and-dirty skim. A more rigorous check would actually compare the geometries between the two runs, instead of (here) comparing the difference from QM.

I visually checked the molecules with the highest RMSDs differences, which are long and floppy. While we expect that flexible molecules can sometimes slide into a different minimum with minor differences in optimization steps, affecting the torsions, I'd expect bonds and angles to remain relatively inflexible. Bond ICRMSD differences range up to 0.005 at the worst. The majority are very low in magnitude. The outlier points around 0.2 remain outlier points.
bond_rmsd

If I had more than 5 min I'd be curious which bond/s are contributing to the molecules with the highest differences in bond ICRMSD between runs (highest bond RMSD shown below) and look at geometries.
Screenshot 2025-10-22 at 5 52 18 pm

Same goes for angles, differences range up to 1 degree difference.
angle_rmsd

Again if I had more time I'd wonder what's going on with this seemingly uncomplicated molecule.
Screenshot 2025-10-22 at 6 02 59 pm

About 1.2k conformers had the exact same RMSD in both runs, leading me to think they probably minimized to the same structure. Checking the ddEs for these conformers (~750 of which didn't have nan energies) showed some differences ranging -0.4 to 0.4 (kcal/mol?). It's likely this comes from the conformer minimum being different in geometry, although hard to guarantee without checking. 452 conformers had ddE differences < 1e-6 though, and 601 < 1e-3, which seems reasonable.

dde_difference

@mattwthompson
Copy link
Member Author

That's a lot for 5 minutes! I've done much less in much more time this morning.

Here's a subset of environment differences, none of which should be a smoking gun:

{'openeye-toolkits': ('2024.1.3', '2025.1.1'),
 'openff-amber-ff-ports': ('0.0.4', '2025.09.0'),
 'openff-forcefields': ('2024.09.0', '2025.10.0'),
 'openff-interchange': ('0.4.0', '0.4.8'),
 'openff-interchange-base': ('0.4.0', '0.4.8'),
 'openff-qcsubmit': ('0.53.0', '0.57.0'),
 'openff-toolkit': ('0.16.5', '0.17.1'),
 'openff-toolkit-base': ('0.16.5', '0.17.1'),
 'openff-units': ('0.2.2', '0.3.1'),
 'openff-utilities': ('0.1.12', '0.1.16'),
 'openmm': ('8.1.2', '8.3.1'),
 'rdkit': ('2024.03.5', '2025.03.6')}

The automation does a conda env export which is good for tracking what was run but doesn't make it easy to use that environment, especially on a different platform. It might be easier to use a tool like conda lock instead

I've pulled out a few molecules which get high ICRMSD differences but it's hard to draw conclusions quickly. I will pick this up later.

Otherwise I've

@mattwthompson
Copy link
Member Author

For now I will follow your recommendation to keep this moving along. There's lots we can do to make these analyses easier; the barrier here was higher than I hoped it to be

@mattwthompson mattwthompson merged commit 33ffe7c into main Oct 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants