-
Notifications
You must be signed in to change notification settings - Fork 57
Support FP32 -> BF16 conversion in epilogue of GroupedGEMM #505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
||
- name: Upload Sarif Artifact | ||
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2 | ||
with: | ||
name: codeql-results-${{ matrix.language }} | ||
path: ./results/${{ matrix.language }}.sarif | ||
retention-days: 7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't modify this file. Maybe it was modified by some GitHub Action
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no it wasn't. You might have made a mistake rebasing your branch locally. Please revert this change.
using EpilogueOp = | ||
cutlass::epilogue::fusion::LinearCombination<float_t, float_t>; | ||
|
||
using CollectiveEpilogue = | ||
typename cutlass::epilogue::collective::CollectiveBuilder< | ||
cutlass::arch::IntelXe, cutlass::arch::OpClassTensorOp, TileShape, | ||
Shape<_1, _1, _1>, cutlass::epilogue::collective::EpilogueTileAuto, | ||
float, float, float, LayoutC, 1, ElementOutput, LayoutC, 1, | ||
EpilogueDispatchPolicy, EpilogueOp>::CollectiveOp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from ElementOutput
being bfloat16_t
, this is the only difference between the vanilla example, and this one.
@rolandschulz, #482 currently doesn't support this case (BF16 output with BF16 inputs & FP32 accum).
I explained this change here.
Also, please advise if I should combine the two examples (different output dtype) in one file.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think we want to combine them into one example if they are almost identical.
Fixes #500
This new config is being supported:
A
matrices areBF16
,B
matrices areBF16
,C
matrices areFP32
, andD
matrices areBF16
. The conversion of output from FP32 to BF16 happens in the epilogue.Just one line change in
include/cutlass/epilogue/collective/builders/xe_builder.inl
to enable dtype conversion in epilogue for GroupedGEMM in the cutlass headers, but the GroupedGEMM example fromexamples/04_bmg_grouped_gemm/04_bmg_grouped_gemm_bf16_output.cpp
has been copy-pasted all over again (please use a diff tool such as BeyondCompare to see the difference in both files) to create a new file with a few lines of code changes that I mostly adapted/copy-pasted from https://github.com/intel/cutlass-sycl/blob/e83f147263dd8ca3589b34d76ce6fbec58bac048/test/unit/gemm/device/default_gemm_group_configuration.hpp.Ideally, we should retain one example & test different output dtypes in it. I'm open to making such a change.
Thanks!