-
Notifications
You must be signed in to change notification settings - Fork 63
Use newer version of mma_atom and copy_atom in CollectiveEpilogue for 00_bmg_gemm test #553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Use newer version of mma_atom and copy_atom in CollectiveEpilogue for 00_bmg_gemm test #553
Conversation
if constexpr (!std::is_void_v<CopyOpG2R>) { | ||
return make_block_2d_copy_A(CopyOpG2R{}, tiled_mma, mC(_,_,0)); | ||
} else { | ||
return make_block_2d_copy_A(tiled_mma, mC(_,_,0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works "by accident" for f16/bf16 A, but let's add a version of make_block_2d_copy_C
that can handle loads. I will send you a patch for that.
for (int epi_v = 0; epi_v < size<0>(trD_compute_frag); ++epi_v) { | ||
trD_compute_frag(epi_v) = cst_callbacks.visit(acc_frag_mn(epi_v), epi_v, epi_m, epi_n); | ||
for (int f = 0; f < FragmentSize; ++f) { | ||
trD_compute_frag[f] = trD_compute(epi_v * FragmentSize + f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why move data here? can we dirrectly access trD_compute?
PR #540 modernizes the collectivemma module by replacing legacy atoms with their updated counterparts.
The current PR focuses on updating the collectiveEpilogue module with similar improvements. However, PR #540 must be merged first as the collectiveEpilogue changes depend on the atom updates introduced in that pull request.