-
Notifications
You must be signed in to change notification settings - Fork 19
[Benchmark] Add all gather matmul benchmark #400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: joydddd/stack/21
Are you sure you want to change the base?
Conversation
stack-info: PR: #400, branch: joydddd/stack/22
0513a58
to
d87a64a
Compare
|
stack-info: PR: #400, branch: joydddd/stack/22
d87a64a
to
a9da45b
Compare
stack-info: PR: #400, branch: joydddd/stack/22
a9da45b
to
e482622
Compare
Optimization implemented in Kraken but not supported in Helion:(a, out) = ag_matmul(a_shared, b), where a = all_gather(a_shared), and out = a@b. Helion does not support conditional calculate tile offset and conditionally use different tensor_descriptor for tensor_descriptor.load. i.e.
Same access pattern can be implementation in Helion as:
But this generates 2 tensor_descriptor loads in each branch, and breaks Triton data prefetching. |
stack-info: PR: #400, branch: joydddd/stack/22
e482622
to
2e5a80e
Compare
2e5a80e
to
1fa69aa
Compare
stack-info: PR: #400, branch: joydddd/stack/22
f0b9614
to
96aa4a7
Compare
4d1ff3b
to
80dd2ea
Compare
stack-info: PR: #400, branch: joydddd/stack/22
96aa4a7
to
cc373e2
Compare
stack-info: PR: #400, branch: joydddd/stack/22
cc373e2
to
55cd2d8
Compare
stack-info: PR: #400, branch: joydddd/stack/22
55cd2d8
to
5171d4b
Compare
If |
5171d4b
to
22858ee
Compare
stack-info: PR: #400, branch: joydddd/stack/22
ec22ee1
to
644b641
Compare
stack-info: PR: #400, branch: joydddd/stack/22
22858ee
to
e0ab2e4
Compare
stack-info: PR: #400, branch: joydddd/stack/22
e0ab2e4
to
dfcd4ad
Compare
stack-info: PR: #400, branch: joydddd/stack/22
dfcd4ad
to
ae9927f
Compare
Yep. If |
Stacked PRs:
[Benchmark] Add all gather matmul benchmark