-
Notifications
You must be signed in to change notification settings - Fork 68
[BENCHMARK] Reuse CUTLASS's gemm configuration file #4720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Instead of creating our own mapping from problem shape to CUTLASS GEMM configuration, re-use existing information in CUTLASS. This adds a small tool that can parse CUTLASS' benchmark configuration files and generate a C++ header with the problem shape to configuration mapping. The generated header is included in the CUTLASS kernel benchmark to dispatch to the best known configuration for each problem shape. Signed-off-by: Lukas Sommer <[email protected]>
…into sommerlukas/reuse-cutlass-gemm-config Signed-off-by: Jefferson Le Quellec <[email protected]>
Signed-off-by: Jefferson Le Quellec <[email protected]>
Signed-off-by: Jefferson Le Quellec <[email protected]>
Hi @jle-quel. Is there a tracker for this yet? So we can monitor it more easily. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please double check no unexpected performance impact on the CUTLASS GEMM before merging.
Started CI: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/16576519983
Potential performance degradation on CUTLASS gemm: Grafana |
As neither the previous nor this configuration are delivering the best known performance for GEMM in CUTLASS, I'd suggest to still merge this PR with the basic infrastructure for the config files and add/use the correct configuration in a future PR. |
Description
This PR introduces a new mechanism for fetching GEMM configurations.
Instead of hardcoding the
(shape → config)
mapping, theconfig-tool.py
script now parses a configuration file and generates thegemm_config
structure dynamically.The configuration file consists of a list of GEMM kernel invocations with the corresponding
GemmConfig
. These will be extracted and used to invoke the kernel with the appropriate configuration.Note
Currently, the configuration file is located in
benchmarks/cutlass_kernel/gemm
. In the future, this should be updated to fetch the file directly from the CUTLASS repository: https://github.com/intel/cutlass-syclThis change will be made once the CUTLASS repo includes a unified file containing the optimal configurations for all shapes used in the Triton benchmark.