-
Notifications
You must be signed in to change notification settings - Fork 2.3k
[LAYOUTS] Implement toLinearLayout for TensorMemoryEncodingAttr #7748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This `toLinearLayout` is a bit different to way we construct the layouts for SharedEncoding. In particular, here we map the tensor into the memory, and not the other way around. This is to be able to model the `packed=True` version of the layout, where we map two different elements to the same `M/N` location. Representing it this way is not an issue in practice, as we always use these layouts by composing their inverse with a distributed layout, so this way we simply have the inverse already at hand.
Different approach after discussing it with @ThomasRaoux. Now we implement it as a map from hardware to the tensor as all the other layouts, which makes everything simpler. Updated the OP reflecting this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, one comment about the linear representation that is probably going to be important
// We model packed layouts as having the rows/cols dimensions of bitwidth=16 | ||
// This means that a layout with unpacked=True is the same as one with | ||
// unpacked=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we will want to track at the byte granularity. For scales we do have 8bits data in the Tensor memory so I think that will help want handling this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's revisit this once we do the scales, but sounds like a reasonable ask
We do so by modelling M/N as describing elements and not the hardware 32bit registers. This allows us to avoid the issue of having two elements pointing to the same register when `unpacked=False`. We also tighten the `MemDescType` verifier and the `TensorMemoryEncodingAttr` verifier to be consistent with the definition we are using. Doing this makes us having to update a ton of lit tests that were silently wrong...
We do so by modelling M/N as describing elements and not the hardware 32bit registers.
This allows us to avoid the issue of having two elements pointing to the same register when
unpacked=False
.We also tighten the
MemDescType
verifier and theTensorMemoryEncodingAttr
verifier to be consistent with the definition we are using. Doing this makes us having to update a ton of lit tests that were silently wrong...