[AMD] Add error message for invalid MFMA tile k-dimension size #8301

christopherpriebe · 2025-09-26T16:54:10Z

This PR adds additional remarks to help with debugging dot operation variant mapping to MFMA and FMA intrinsics.

Added Remarks

Each failed call of chooseMfmaInstruction will emit a remark informing that MFMA intrinsic selection failed with argument info. In particular, there is a remark for both the initial selection failure case and the further failure case where the k-dimension of the tile is not a multiple of the k-dimension of the intrinsic.
Will emit a generic remark if unable to map the tt::DotOp to a V_MFMA_*_F8F6F4 intrinsic.
Will emit a generic remark if unable to map the tt::DotOp to any MFMA intrinsic.
Will emit a generic remark that a tt::DotScaledOp is being decomposed into a tt::DotOp with explicit scaling. This occurs if tt::DotScaledOp cannot be mapped to a V_MFMA_*_F8F6F4 intrinsic.
Will emit a generic remark that a tt::DotOp is being mapped to an FMA intrinsic, which appears to never fail unless the operation is already mapped.

TODOs

Need to update the lit tests to expect these new remarks.

Questions

The changes I am proposing do not seem cohesive. There are many information-carrying match failures throughout the rewrite rules. Should I change some of these remarks to align with this, or should some of these match failures change to remarks?

PS

Thank you to all who have provided feedback. I am new to contributions to Triton, and this is my first PR.

third_party/amd/lib/TritonAMDGPUTransforms/AccelerateAMDMatmul.cpp

…t MFMA intrinsics.

…and failure to map scaled dot operation to double-rated MFMA.

third_party/amd/lib/TritonAMDGPUTransforms/AccelerateAMDMatmul.cpp

christopherpriebe · 2025-10-07T03:00:01Z

I have finalized my changes to the pass. However, I have been struggling a lot debugging a lit test failure. In the current version pushed to my fork, when I run the lit tests, I get the following error:

root@298e307a07ca:~/workspace/triton# make test-lit
ninja -C /root/workspace/triton/build/cmake.linux-x86_64-cpython-3.12 check-triton-lit-tests
ninja: Entering directory `/root/workspace/triton/build/cmake.linux-x86_64-cpython-3.12'
[0/1] Running the triton regression tests
FAIL: TRITON :: TritonGPU/amd/accelerate-amd-matmul-mfma.mlir (46 of 202)
******************** TEST 'TRITON :: TritonGPU/amd/accelerate-amd-matmul-mfma.mlir' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 1: /root/workspace/triton/build/cmake.linux-x86_64-cpython-3.12/bin/triton-opt /root/workspace/triton/test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir -split-input-file --tritonamdgpu-accelerate-matmul="arch-generation-name=gfx942 matrix-instruction-size=0" --verify-diagnostics  | FileCheck /root/workspace/triton/test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir --check-prefixes MFMA0,CHECK
+ FileCheck /root/workspace/triton/test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir --check-prefixes MFMA0,CHECK
+ /root/workspace/triton/build/cmake.linux-x86_64-cpython-3.12/bin/triton-opt /root/workspace/triton/test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir -split-input-file '--tritonamdgpu-accelerate-matmul=arch-generation-name=gfx942 matrix-instruction-size=0' --verify-diagnostics
RUN: at line 2: /root/workspace/triton/build/cmake.linux-x86_64-cpython-3.12/bin/triton-opt /root/workspace/triton/test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir -split-input-file --tritonamdgpu-accelerate-matmul="arch-generation-name=gfx942 matrix-instruction-size=16" --verify-diagnostics | FileCheck /root/workspace/triton/test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir --check-prefixes MFMA16,CHECK
+ FileCheck /root/workspace/triton/test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir --check-prefixes MFMA16,CHECK
+ /root/workspace/triton/build/cmake.linux-x86_64-cpython-3.12/bin/triton-opt /root/workspace/triton/test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir -split-input-file '--tritonamdgpu-accelerate-matmul=arch-generation-name=gfx942 matrix-instruction-size=16' --verify-diagnostics
/root/workspace/triton/test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir:76:8: error: expected remark "Unable to select MFMA intrinsic" was not produced
    // expected-remark @+2 {{Unable to select MFMA intrinsic}}
       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/root/workspace/triton/test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir:77:8: error: expected remark "Attempting to map dot operation to FMA intrinsic." was not produced
    // expected-remark @+1 {{Attempting to map dot operation to FMA intrinsic.}}
       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

--

********************
********************
Failed Tests (1):
  TRITON :: TritonGPU/amd/accelerate-amd-matmul-mfma.mlir


Testing Time: 0.63s

Total Discovered Tests: 202
  Passed: 201 (99.50%)
  Failed:   1 (0.50%)

But, if I remove the expected remarks from the failing test, I get the following error.

root@298e307a07ca:~/workspace/triton# make test-lit
ninja -C /root/workspace/triton/build/cmake.linux-x86_64-cpython-3.12 check-triton-lit-tests
ninja: Entering directory `/root/workspace/triton/build/cmake.linux-x86_64-cpython-3.12'
[0/1] Running the triton regression tests
FAIL: TRITON :: TritonGPU/amd/accelerate-amd-matmul-mfma.mlir (1 of 202)
******************** TEST 'TRITON :: TritonGPU/amd/accelerate-amd-matmul-mfma.mlir' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 1: /root/workspace/triton/build/cmake.linux-x86_64-cpython-3.12/bin/triton-opt /root/workspace/triton/test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir -split-input-file --tritonamdgpu-accelerate-matmul="arch-generation-name=gfx942 matrix-instruction-size=0" --verify-diagnostics  | FileCheck /root/workspace/triton/test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir --check-prefixes MFMA0,CHECK
+ FileCheck /root/workspace/triton/test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir --check-prefixes MFMA0,CHECK
+ /root/workspace/triton/build/cmake.linux-x86_64-cpython-3.12/bin/triton-opt /root/workspace/triton/test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir -split-input-file '--tritonamdgpu-accelerate-matmul=arch-generation-name=gfx942 matrix-instruction-size=0' --verify-diagnostics
/root/workspace/triton/test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir:76:15: error: unexpected remark: Unable to select MFMA intrinsic for the request: version=3, result-shape=(1x128), selected-tiles=(0x0), inputKSize=64, aElemType='f16', bElemType='f16', withScale=false, allowXF32=false
    %result = tt.dot %a, %b, %zero_f32 : tensor<1x64xf16, #ttg.dot_op<{opIdx = 0, parent = #blocked}>> * tensor<64x128xf16, #ttg.dot_op<{opIdx = 1, parent = #blocked}>> -> tensor<1x128xf32, #blocked>
              ^
/root/workspace/triton/test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir:76:15: error: unexpected remark: Attempting to map dot operation to FMA intrinsic.
    %result = tt.dot %a, %b, %zero_f32 : tensor<1x64xf16, #ttg.dot_op<{opIdx = 0, parent = #blocked}>> * tensor<64x128xf16, #ttg.dot_op<{opIdx = 1, parent = #blocked}>> -> tensor<1x128xf32, #blocked>
              ^

--

********************
********************
Failed Tests (1):
  TRITON :: TritonGPU/amd/accelerate-amd-matmul-mfma.mlir


Testing Time: 0.65s

Total Discovered Tests: 202
  Passed: 201 (99.50%)
  Failed:   1 (0.50%)

My initial thoughts are that there is something to do with the MFMA instruction width parameter that is affecting this, but I cannot figure out how to separate the tests (if my reasoning is correct) so that I only expect the remarks at the right place.

yiqian1 · 2025-10-07T17:28:28Z

The error messages were generated with matrix-instruction-size=0 as expected, but matrix-instruction-size=16 shouldn't emit error messages. You can split the tests into two files.

test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir

christopherpriebe · 2025-10-07T20:33:15Z

Lit tests are all passing but getting random unit test failures now.

FAILED language/test_core.py::test_scaled_dot[128-64-128-True-False-False-e2m1-e4m3-4-0-1] - AssertionError: Tensor-likes are not close!
FAILED language/test_core.py::test_scaled_dot[128-64-128-True-False-False-e2m1-e4m3-4-16-1] - AssertionError: Tensor-likes are not close!
FAILED language/test_core.py::test_scaled_dot[128-64-128-True-False-False-e2m1-e4m3-4-32-1] - AssertionError: Tensor-likes are not close!

christopherpriebe · 2025-10-09T01:06:31Z

Lit tests are all passing but getting random unit test failures now.

FAILED language/test_core.py::test_scaled_dot[128-64-128-True-False-False-e2m1-e4m3-4-0-1] - AssertionError: Tensor-likes are not close!
FAILED language/test_core.py::test_scaled_dot[128-64-128-True-False-False-e2m1-e4m3-4-16-1] - AssertionError: Tensor-likes are not close!
FAILED language/test_core.py::test_scaled_dot[128-64-128-True-False-False-e2m1-e4m3-4-32-1] - AssertionError: Tensor-likes are not close!

Nevermind. I think this is only happening on my local machine. I am not sure why the CI failed previously. Waiting for the CI to run again, and if there is a failure, I will try to replicate on my machine, but I am a bit confused.

zhanglx13

LGTM. Thanks @christopherpriebe for the clean up.

…n-lang#8301) This PR adds additional remarks to help with debugging dot operation variant mapping to MFMA and FMA intrinsics. ## Added Remarks - Each failed call of `chooseMfmaInstruction` will emit a remark informing that MFMA intrinsic selection failed with argument info. In particular, there is a remark for both the initial selection failure case and the further failure case where the k-dimension of the tile is not a multiple of the k-dimension of the intrinsic. - Will emit a generic remark if unable to map the `tt::DotOp` to a `V_MFMA_*_F8F6F4` intrinsic. - Will emit a generic remark if unable to map the `tt::DotOp` to any MFMA intrinsic. - Will emit a generic remark that a `tt::DotScaledOp` is being decomposed into a `tt::DotOp` with explicit scaling. This occurs if `tt::DotScaledOp` cannot be mapped to a `V_MFMA_*_F8F6F4` intrinsic. - Will emit a generic remark that a `tt::DotOp` is being mapped to an FMA intrinsic, which appears to never fail unless the operation is already mapped.

christopherpriebe and others added 2 commits September 26, 2025 11:40

Invalid tile k-dimension size now throws an error with debug info.

d1fd2e8

Merge branch 'triton-lang:main' into main

fa7fc16

ThomasRaoux reviewed Sep 26, 2025

View reviewed changes

third_party/amd/lib/TritonAMDGPUTransforms/AccelerateAMDMatmul.cpp Show resolved Hide resolved

christopherpriebe added 3 commits September 29, 2025 16:43

BlockedToMFMA pass now emits remarks and warnings for issues to selec…

fab5c46

…t MFMA intrinsics.

Removed error and added remarks for failure to select MFMA intrinsic …

e765aa4

…and failure to map scaled dot operation to double-rated MFMA.

Added additional remarks for failure to select MFMA intrinsic.

58a8bd5

antiagainst requested changes Oct 2, 2025

View reviewed changes

antiagainst marked this pull request as ready for review October 4, 2025 02:09

antiagainst requested a review from zhanglx13 as a code owner October 4, 2025 02:09

christopherpriebe added 2 commits October 6, 2025 20:58

Converted match failure remarks into notifyMatchFailure calls.

de93920

Modified MFMA tests to reflect new remarks.

a21d44e

christopherpriebe requested a review from ptillet as a code owner October 7, 2025 02:55

christopherpriebe requested a review from antiagainst October 7, 2025 15:14

yiqian1 reviewed Oct 7, 2025

View reviewed changes

test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir Show resolved Hide resolved

christopherpriebe and others added 2 commits October 7, 2025 15:08

Split tests into separate files to isolate where remarks are expected.

d065254

Merge branch 'main' into main

4278cf5

christopherpriebe requested a review from ThomasRaoux October 7, 2025 20:18

christopherpriebe and others added 2 commits October 8, 2025 20:04

Ran pre-commit.

9d40ffd

Merge branch 'main' into main

d433948

zhanglx13 approved these changes Oct 9, 2025

View reviewed changes

antiagainst approved these changes Oct 9, 2025

View reviewed changes

antiagainst merged commit b611ccd into triton-lang:main Oct 9, 2025
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMD] Add error message for invalid MFMA tile k-dimension size #8301

[AMD] Add error message for invalid MFMA tile k-dimension size #8301

christopherpriebe commented Sep 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

christopherpriebe commented Oct 7, 2025

Uh oh!

yiqian1 commented Oct 7, 2025

Uh oh!

Uh oh!

christopherpriebe commented Oct 7, 2025

Uh oh!

christopherpriebe commented Oct 9, 2025

Uh oh!

zhanglx13 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[AMD] Add error message for invalid MFMA tile k-dimension size #8301

[AMD] Add error message for invalid MFMA tile k-dimension size #8301

Conversation

christopherpriebe commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Added Remarks

TODOs

Questions

PS

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

christopherpriebe commented Oct 7, 2025

Uh oh!

yiqian1 commented Oct 7, 2025

Uh oh!

Uh oh!

christopherpriebe commented Oct 7, 2025

Uh oh!

christopherpriebe commented Oct 9, 2025

Uh oh!

zhanglx13 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

christopherpriebe commented Sep 26, 2025 •

edited

Loading