Skip to content

Conversation

anamikac-intel
Copy link

@anamikac-intel anamikac-intel commented Oct 10, 2025

PR #540 modernizes the collectivemma module by replacing legacy atoms with their updated counterparts.
The current PR focuses on updating the collectiveEpilogue module with similar improvements. However, PR #540 must be merged first as the collectiveEpilogue changes depend on the atom updates introduced in that pull request.

@anamikac-intel anamikac-intel marked this pull request as ready for review October 13, 2025 07:02
if constexpr (!std::is_void_v<CopyOpG2R>) {
return make_block_2d_copy_A(CopyOpG2R{}, tiled_mma, mC(_,_,0));
} else {
return make_block_2d_copy_A(tiled_mma, mC(_,_,0));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works "by accident" for f16/bf16 A, but let's add a version of make_block_2d_copy_C that can handle loads. I will send you a patch for that.

@Antonyvance Antonyvance added release urgent PR requires a urgent attention (for release or blocking another PR) labels Oct 17, 2025
for (int epi_v = 0; epi_v < size<0>(trD_compute_frag); ++epi_v) {
trD_compute_frag(epi_v) = cst_callbacks.visit(acc_frag_mn(epi_v), epi_v, epi_m, epi_n);
for (int f = 0; f < FragmentSize; ++f) {
trD_compute_frag[f] = trD_compute(epi_v * FragmentSize + f);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why move data here? can we dirrectly access trD_compute?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release urgent PR requires a urgent attention (for release or blocking another PR)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants