Skip to content

Conversation

christopherpriebe
Copy link
Contributor

@christopherpriebe christopherpriebe commented Sep 26, 2025

This PR adds additional remarks to help with debugging dot operation variant mapping to MFMA and FMA intrinsics.

Added Remarks

  • Each failed call of chooseMfmaInstruction will emit a remark informing that MFMA intrinsic selection failed with argument info. In particular, there is a remark for both the initial selection failure case and the further failure case where the k-dimension of the tile is not a multiple of the k-dimension of the intrinsic.
  • Will emit a generic remark if unable to map the tt::DotOp to a V_MFMA_*_F8F6F4 intrinsic.
  • Will emit a generic remark if unable to map the tt::DotOp to any MFMA intrinsic.
  • Will emit a generic remark that a tt::DotScaledOp is being decomposed into a tt::DotOp with explicit scaling. This occurs if tt::DotScaledOp cannot be mapped to a V_MFMA_*_F8F6F4 intrinsic.
  • Will emit a generic remark that a tt::DotOp is being mapped to an FMA intrinsic, which appears to never fail unless the operation is already mapped.

TODOs

  • Need to update the lit tests to expect these new remarks.

Questions

  • The changes I am proposing do not seem cohesive. There are many information-carrying match failures throughout the rewrite rules. Should I change some of these remarks to align with this, or should some of these match failures change to remarks?

PS

Thank you to all who have provided feedback. I am new to contributions to Triton, and this is my first PR.

@antiagainst antiagainst marked this pull request as ready for review October 4, 2025 02:09
@antiagainst antiagainst requested a review from zhanglx13 as a code owner October 4, 2025 02:09
@christopherpriebe
Copy link
Contributor Author

I have finalized my changes to the pass. However, I have been struggling a lot debugging a lit test failure. In the current version pushed to my fork, when I run the lit tests, I get the following error:

root@298e307a07ca:~/workspace/triton# make test-lit
ninja -C /root/workspace/triton/build/cmake.linux-x86_64-cpython-3.12 check-triton-lit-tests
ninja: Entering directory `/root/workspace/triton/build/cmake.linux-x86_64-cpython-3.12'
[0/1] Running the triton regression tests
FAIL: TRITON :: TritonGPU/amd/accelerate-amd-matmul-mfma.mlir (46 of 202)
******************** TEST 'TRITON :: TritonGPU/amd/accelerate-amd-matmul-mfma.mlir' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 1: /root/workspace/triton/build/cmake.linux-x86_64-cpython-3.12/bin/triton-opt /root/workspace/triton/test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir -split-input-file --tritonamdgpu-accelerate-matmul="arch-generation-name=gfx942 matrix-instruction-size=0" --verify-diagnostics  | FileCheck /root/workspace/triton/test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir --check-prefixes MFMA0,CHECK
+ FileCheck /root/workspace/triton/test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir --check-prefixes MFMA0,CHECK
+ /root/workspace/triton/build/cmake.linux-x86_64-cpython-3.12/bin/triton-opt /root/workspace/triton/test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir -split-input-file '--tritonamdgpu-accelerate-matmul=arch-generation-name=gfx942 matrix-instruction-size=0' --verify-diagnostics
RUN: at line 2: /root/workspace/triton/build/cmake.linux-x86_64-cpython-3.12/bin/triton-opt /root/workspace/triton/test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir -split-input-file --tritonamdgpu-accelerate-matmul="arch-generation-name=gfx942 matrix-instruction-size=16" --verify-diagnostics | FileCheck /root/workspace/triton/test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir --check-prefixes MFMA16,CHECK
+ FileCheck /root/workspace/triton/test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir --check-prefixes MFMA16,CHECK
+ /root/workspace/triton/build/cmake.linux-x86_64-cpython-3.12/bin/triton-opt /root/workspace/triton/test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir -split-input-file '--tritonamdgpu-accelerate-matmul=arch-generation-name=gfx942 matrix-instruction-size=16' --verify-diagnostics
/root/workspace/triton/test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir:76:8: error: expected remark "Unable to select MFMA intrinsic" was not produced
    // expected-remark @+2 {{Unable to select MFMA intrinsic}}
       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/root/workspace/triton/test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir:77:8: error: expected remark "Attempting to map dot operation to FMA intrinsic." was not produced
    // expected-remark @+1 {{Attempting to map dot operation to FMA intrinsic.}}
       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

--

********************
********************
Failed Tests (1):
  TRITON :: TritonGPU/amd/accelerate-amd-matmul-mfma.mlir


Testing Time: 0.63s

Total Discovered Tests: 202
  Passed: 201 (99.50%)
  Failed:   1 (0.50%)

But, if I remove the expected remarks from the failing test, I get the following error.

root@298e307a07ca:~/workspace/triton# make test-lit
ninja -C /root/workspace/triton/build/cmake.linux-x86_64-cpython-3.12 check-triton-lit-tests
ninja: Entering directory `/root/workspace/triton/build/cmake.linux-x86_64-cpython-3.12'
[0/1] Running the triton regression tests
FAIL: TRITON :: TritonGPU/amd/accelerate-amd-matmul-mfma.mlir (1 of 202)
******************** TEST 'TRITON :: TritonGPU/amd/accelerate-amd-matmul-mfma.mlir' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 1: /root/workspace/triton/build/cmake.linux-x86_64-cpython-3.12/bin/triton-opt /root/workspace/triton/test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir -split-input-file --tritonamdgpu-accelerate-matmul="arch-generation-name=gfx942 matrix-instruction-size=0" --verify-diagnostics  | FileCheck /root/workspace/triton/test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir --check-prefixes MFMA0,CHECK
+ FileCheck /root/workspace/triton/test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir --check-prefixes MFMA0,CHECK
+ /root/workspace/triton/build/cmake.linux-x86_64-cpython-3.12/bin/triton-opt /root/workspace/triton/test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir -split-input-file '--tritonamdgpu-accelerate-matmul=arch-generation-name=gfx942 matrix-instruction-size=0' --verify-diagnostics
/root/workspace/triton/test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir:76:15: error: unexpected remark: Unable to select MFMA intrinsic for the request: version=3, result-shape=(1x128), selected-tiles=(0x0), inputKSize=64, aElemType='f16', bElemType='f16', withScale=false, allowXF32=false
    %result = tt.dot %a, %b, %zero_f32 : tensor<1x64xf16, #ttg.dot_op<{opIdx = 0, parent = #blocked}>> * tensor<64x128xf16, #ttg.dot_op<{opIdx = 1, parent = #blocked}>> -> tensor<1x128xf32, #blocked>
              ^
/root/workspace/triton/test/TritonGPU/amd/accelerate-amd-matmul-mfma.mlir:76:15: error: unexpected remark: Attempting to map dot operation to FMA intrinsic.
    %result = tt.dot %a, %b, %zero_f32 : tensor<1x64xf16, #ttg.dot_op<{opIdx = 0, parent = #blocked}>> * tensor<64x128xf16, #ttg.dot_op<{opIdx = 1, parent = #blocked}>> -> tensor<1x128xf32, #blocked>
              ^

--

********************
********************
Failed Tests (1):
  TRITON :: TritonGPU/amd/accelerate-amd-matmul-mfma.mlir


Testing Time: 0.65s

Total Discovered Tests: 202
  Passed: 201 (99.50%)
  Failed:   1 (0.50%)

My initial thoughts are that there is something to do with the MFMA instruction width parameter that is affecting this, but I cannot figure out how to separate the tests (if my reasoning is correct) so that I only expect the remarks at the right place.

@yiqian1
Copy link
Contributor

yiqian1 commented Oct 7, 2025

The error messages were generated with matrix-instruction-size=0 as expected, but matrix-instruction-size=16 shouldn't emit error messages. You can split the tests into two files.

@christopherpriebe
Copy link
Contributor Author

Lit tests are all passing but getting random unit test failures now.

FAILED language/test_core.py::test_scaled_dot[128-64-128-True-False-False-e2m1-e4m3-4-0-1] - AssertionError: Tensor-likes are not close!
FAILED language/test_core.py::test_scaled_dot[128-64-128-True-False-False-e2m1-e4m3-4-16-1] - AssertionError: Tensor-likes are not close!
FAILED language/test_core.py::test_scaled_dot[128-64-128-True-False-False-e2m1-e4m3-4-32-1] - AssertionError: Tensor-likes are not close!

@christopherpriebe
Copy link
Contributor Author

Lit tests are all passing but getting random unit test failures now.

FAILED language/test_core.py::test_scaled_dot[128-64-128-True-False-False-e2m1-e4m3-4-0-1] - AssertionError: Tensor-likes are not close!
FAILED language/test_core.py::test_scaled_dot[128-64-128-True-False-False-e2m1-e4m3-4-16-1] - AssertionError: Tensor-likes are not close!
FAILED language/test_core.py::test_scaled_dot[128-64-128-True-False-False-e2m1-e4m3-4-32-1] - AssertionError: Tensor-likes are not close!

Nevermind. I think this is only happening on my local machine. I am not sure why the CI failed previously. Waiting for the CI to run again, and if there is a failure, I will try to replicate on my machine, but I am a bit confused.

Copy link
Collaborator

@zhanglx13 zhanglx13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @christopherpriebe for the clean up.

@antiagainst antiagainst merged commit b611ccd into triton-lang:main Oct 9, 2025
9 checks passed
ita9naiwa pushed a commit to ita9naiwa/triton that referenced this pull request Oct 12, 2025
…n-lang#8301)

This PR adds additional remarks to help with debugging dot operation
variant mapping to MFMA and FMA intrinsics.

## Added Remarks
- Each failed call of `chooseMfmaInstruction` will emit a remark
informing that MFMA intrinsic selection failed with argument info. In
particular, there is a remark for both the initial selection failure
case and the further failure case where the k-dimension of the tile is
not a multiple of the k-dimension of the intrinsic.
- Will emit a generic remark if unable to map the `tt::DotOp` to a
`V_MFMA_*_F8F6F4` intrinsic.
- Will emit a generic remark if unable to map the `tt::DotOp` to any
MFMA intrinsic.
- Will emit a generic remark that a `tt::DotScaledOp` is being
decomposed into a `tt::DotOp` with explicit scaling. This occurs if
`tt::DotScaledOp` cannot be mapped to a `V_MFMA_*_F8F6F4` intrinsic.
- Will emit a generic remark that a `tt::DotOp` is being mapped to an
FMA intrinsic, which appears to never fail unless the operation is
already mapped.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants