Performance Improvement for hydrogenbond analysis for the `between` keyword #5029

tanishy7777 · 2025-04-16T22:40:14Z

Changes made in this Pull Request:

Moved the capped distance calculation after the filtering using the between keyword.
Now we loop through all pairs of atom groups which we get from the between keyword. Then get the donor and acceptors based on the distance cutoff (capped distance). Then combine the results at the end.

PR Checklist

Issue raised/referenced?
Tests updated/added?
Documentation updated/added?
package/CHANGELOG file updated?
Is your name in package/AUTHORS? (If it is not, add it!)

Developers Certificate of Origin

I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.

📚 Documentation preview 📚: https://mdanalysis--5029.org.readthedocs.build/en/5029/

tanishy7777 · 2025-04-16T22:43:31Z

Timing Benchmark comparing the original and this implementation

Improved implementation:

Original Implementation:

codecov · 2025-04-16T22:57:15Z

Codecov Report

Attention: Patch coverage is 90.16393% with 6 lines in your changes missing coverage. Please review.

Project coverage is 93.61%. Comparing base (af9848b) to head (8ce4165).
Report is 1 commits behind head on develop.

Files with missing lines	Patch %	Lines
...DAnalysis/analysis/hydrogenbonds/hbond_analysis.py	90.16%	2 Missing and 4 partials ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #5029      +/-   ##
===========================================
- Coverage    93.62%   93.61%   -0.02%     
===========================================
  Files          177      177              
  Lines        21995    22037      +42     
  Branches      3112     3124      +12     
===========================================
+ Hits         20593    20629      +36     
- Misses         947      949       +2     
- Partials       455      459       +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

tylerjereddy · 2025-04-24T22:42:17Z

A more standard approach to benchmarking would be to use asv--I suppose we don't have such benchmarks written for the H-bond analysis yet based on git grep of our benchmarks folder contents, though other examples are there.

tanishy7777 · 2025-04-24T22:52:59Z

A more standard approach to benchmarking would be to use asv--I suppose we don't have such benchmarks written for the H-bond analysis yet based on git grep of our benchmarks folder contents, though other examples are there.

Yep, ASV benchmarks for hbond analysis isn't there, so I just used the timeit library.
Should I write a benchmark for the between keyword to assess the efficiency before and after the changes?

tylerjereddy · 2025-04-24T23:16:29Z

Maybe wait for someone else to comment on that. I know that upstream we usually don't block PRs because asv benchmarks don't exist yet. I just tend to find it way easier to see the results in a common approach/format, and if I'm feeling skeptical I can then run them locally and see for myself.

tanishy7777 · 2025-04-25T17:57:40Z

Maybe wait for someone else to comment on that. I know that upstream we usually don't block PRs because asv benchmarks don't exist yet. I just tend to find it way easier to see the results in a common approach/format, and if I'm feeling skeptical I can then run them locally and see for myself.

Thanks for clarifying! That makes sense, adding ASV benchmarks does make the performance results easier to compare.

Adding benchmarks for the Hbond analysis module is actually something I included in my GSoC proposal(pending results), so while I'm not starting that yet, I would be happy to add them if others think it’s useful for this PR.

orbeckst · 2025-06-26T20:45:23Z

@tanishy7777 are you still interested in continuing the PR?

Did I read your preliminary benchmark correctly in that your changes improve execution from ~12.5s to ~11s ? That's not an enormous improvement but it's better. As long as the code is still correct and not harder to read/maintain, I'd be supportive of such a improvement.

Having and ASV benchmark would be neat but I agree with @tylerjereddy that this would not be a blocker.

orbeckst · 2025-06-26T20:45:53Z

@p-j-smith would you be able to look after this PR, if @tanishy7777 were to continue working on it?

orbeckst · 2025-06-26T20:47:46Z

@p-j-smith If you don't have time, please un-assign yourself.

If you take on PR-shepherding, feel free to close the PR once you consider it stale.

p-j-smith

thanks @tanishy7777 for tackling this!

Adding benchmakrs with asv would be nice but it's not a blocker. But it would be good to know how you did the benchmarking (size of the system, number of frames, atom selections, arguments passed to HydrogenBondAnalysis etc.).

p-j-smith · 2025-07-16T10:04:25Z