-
Notifications
You must be signed in to change notification settings - Fork 14.7k
Description
As spotted by @Mel-Chen in this review comment: #149981 (comment)
Consider an EVL tail folded loop with a VF of 4 and a trip count of 5. With EVL tail folding, it's possible that this will take place with two iterations, one with EVL=3, and one with EVL=2.
A header mask will come in with the form icmp ule wide-canonical-iv, backedge-tc
.
Most recipes will be converted to a VP intrinsic to use EVL in optimizeMaskToEVL
. This should really be thought of as an optimisation, but consider a recipe that isn't handled yet or slips through, and so still uses the header mask.
The header mask is generated as icmp ule wide-canonical-iv, backedge-tc
.
On the first iteration, the mask will look like:
[0, 1, 2, 3] <= 4 = [T, T, T, T]
However for the recipes which were optimized to VP intrinsics, they will have an EVL of 3, so basically a mask of [T, T, T, F]
.
On the second iteration, the mask will look like:
[4, 5, 6, 7] <= 4 = [T, F, F, F]
But for the VP intrinsics, they will have an EVL of 2 so a mask of [T, T, F, F]
.
We need to convert the header masks to something of the form icmp ult step-vector, EVL
, otherwise we end up processing a different number of elements per iteration depending on whether or not it was converted to a VP intrinsic.