Skip to content

Conversation

lezcano
Copy link
Contributor

@lezcano lezcano commented Aug 2, 2025

We do so by modelling M/N as describing elements and not the hardware 32bit registers.
This allows us to avoid the issue of having two elements pointing to the same register when unpacked=False.

We also tighten the MemDescType verifier and the TensorMemoryEncodingAttr verifier to be consistent with the definition we are using. Doing this makes us having to update a ton of lit tests that were silently wrong...

@lezcano lezcano requested a review from ptillet as a code owner August 2, 2025 10:15
@lezcano lezcano requested a review from ThomasRaoux August 2, 2025 10:16
@lezcano lezcano changed the title [LAYOUS] Implement toLinearLayout for TensorMemoryEncodingAttr [LAYOUTS] Implement toLinearLayout for TensorMemoryEncodingAttr Aug 2, 2025
This `toLinearLayout` is a bit different to way we construct the layouts
for SharedEncoding. In particular, here we map the tensor into the
memory, and not the other way around. This is to be able to model the
`packed=True` version of the layout, where we map two different elements
to the same `M/N` location.

Representing it this way is not an issue in practice, as we always use
these layouts by composing their inverse with a distributed layout, so
this way we simply have the inverse already at hand.
@lezcano
Copy link
Contributor Author

lezcano commented Aug 4, 2025

Different approach after discussing it with @ThomasRaoux. Now we implement it as a map from hardware to the tensor as all the other layouts, which makes everything simpler. Updated the OP reflecting this

@lezcano lezcano enabled auto-merge (squash) August 4, 2025 13:56
Copy link
Collaborator

@ThomasRaoux ThomasRaoux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one comment about the linear representation that is probably going to be important

Comment on lines +1193 to +1195
// We model packed layouts as having the rows/cols dimensions of bitwidth=16
// This means that a layout with unpacked=True is the same as one with
// unpacked=False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we will want to track at the byte granularity. For scales we do have 8bits data in the Tensor memory so I think that will help want handling this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's revisit this once we do the scales, but sounds like a reasonable ask

@lezcano lezcano merged commit 40335eb into main Aug 4, 2025
25 of 27 checks passed
@lezcano lezcano deleted the tmem branch August 4, 2025 21:34
peterbell10 added a commit that referenced this pull request Aug 5, 2025
…tr (#7748)"

This reverts commit 40335eb.

git-pr-chain: revert_layouts_implement_tolinearlayout__9b92
peterbell10 added a commit that referenced this pull request Aug 5, 2025
ptillet pushed a commit that referenced this pull request Aug 7, 2025
We do so by modelling M/N as describing elements and not the hardware
32bit registers.
This allows us to avoid the issue of having two elements pointing to the
same register when `unpacked=False`.

We also tighten the `MemDescType` verifier and the
`TensorMemoryEncodingAttr` verifier to be consistent with the definition
we are using. Doing this makes us having to update a ton of lit tests
that were silently wrong...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants