ch4/ofi: rma get algorithm using mirror buffers #7695
Draft
+528
−102
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pull Request Description
MPI_Getwill fallback to MPIDIG active message layer when either the window buffers are device buffers (and libfabric HMEM is not enabled or supported) or the origin buffers is device buffer. The active message transport in ofi uses RDMA for data greater than 16KB, With gpu memory for both origin buffer and window base buffer, that involves an mr registration per message and extra allocation of staging buffers on both the origin and target side. Besides, both staging are currently synchronous.This PR proposes an alternative fallback algorithm -
[skip warnings]
Author Checklist
Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
Commits are self-contained and do not do two things at once.
Commit message is of the form:
module: short descriptionCommit message explains what's in the commit.
Whitespace checker. Warnings test. Additional tests via comments.
For non-Argonne authors, check contribution agreement.
If necessary, request an explicit comment from your companies PR approval manager.