Add support for fp8e4m3fnuz dtype in Triton #8231

arnav-garg1 · 2025-09-19T07:03:36Z

Register fp8e4m3fnuz in dtype with mantissa/exponent details
Update canonicalization + bitwidth mapping in _utils.py
Expose new dtype in triton.language public API
Add unit tests:
- Verify dtype existence and repr
- Round-trip float32 → fp8e4m3fnuz → float32 (skipped if CUDA/torch support is missing)
Removed incorrect placeholder mapping of float8_e4m3fnuz -> fp8e4b8

New contributor declaration

I am not making a trivial change, such as fixing a typo in a comment.
I have written a PR description following these
rules.
I have run pre-commit run --from-ref origin/main --to-ref HEAD.
Select one of the following.
- I have added tests.
  - /test for lit tests
  - /unittest for C++ tests
  - /python/test for end-to-end tests
- This PR does not need a test because N/A.
Select one of the following.
- I have not added any lit tests.
- The lit tests I have added follow these best practices,
  including the "tests should be minimal" section. (Usually running Python code
  and using the instructions it generates is not minimal.)

- Register in with mantissa/exponent details - Update canonicalization + bitwidth mapping in - Expose new dtype in public API - Add unit tests: * Verify dtype existence and repr * Round-trip float32 → fp8e4m3fnuz → float32 (skipped if CUDA/torch support is missing) - Removed incorrect placeholder mapping of Closes triton-lang#8164

ThomasRaoux · 2025-09-19T16:19:29Z

python/test/unit/language/test_core.py

+@pytest.mark.skipif(not torch.cuda.is_available(), reason="CUDA is required for float8 tests")
+@pytest.mark.skipif(not hasattr(torch, "float8_e4m3fnuz"), reason="PyTorch build does not expose float8_e4m3fnuz")
+def test_float8e4m3fnuz_roundtrip():
+    # Create random data
+    x = torch.randn(32, device="cuda", dtype=torch.float32)
+    # Cast to fp8e4m3fnuz
+    y = x.to(torch.float8_e4m3fnuz)
+    # Cast back to fp32
+    z = y.to(torch.float32)
+
+    # Shapes must match
+    assert z.shape == x.shape
+    # Result must still be a tensor
+    assert torch.is_tensor(z)
+    # Values should be approximately equal (fp8 is lossy)
+    assert torch.allclose(x, z, atol=1e-1, rtol=1e-1)


that doesn't test triton at all

Thanks for the callout—updated. The tests now JIT and run Triton kernels that cast inside Triton:

y = x.to(tl.float8e4m3fnuz).to(tl.float32)

We then compare against a PyTorch reference (torch.float8_e4m3fnuz).
The suite includes:

a minimal round-trip kernel,

a param sweep over (BLOCK, num_warps),

a cast-only variant (no load/store dtype kwargs, for API compatibility).

Tests are backend-aware as they run where the backing FP8 is supported and skip cleanly otherwise. This ensures we’re actually testing Triton lowering/JIT, not just PyTorch

ThomasRaoux · 2025-09-19T16:23:58Z

python/triton/_utils.py

    "float8_e4m3fn": "fp8e4nv",
    "float8e4b8": "fp8e4b8",
-    "float8_e4m3fnuz": "fp8e4b8",
+    "float8_e4m3fnuz": "fp8e4m3fnuz",


fp8e4b8 maps to e4m3_fnuz already, what does this fix?

triton/python/src/ir.cc

Line 943 in 7d92894

.def("get_fp8e4b8_ty",

Agreed—no new backend dtype is introduced here. This PR only exposes a user-facing alias tl.float8e4m3fnuz and keeps canonicalization:

_utils.py

"float8_e4m3fnuz": "fp8e4b8"

The goal is ergonomic parity with PyTorch’s torch.float8_e4m3fnuz so kernels can write:

x.to(tl.float8e4m3fnuz)

without needing to know the backend internal name. All IR paths would still go through the existing fp8e4b8 plumbing so behavior is unchanged.

…mapping; tidy utils

arnav-garg1

Addressing comments and sharing revision 2. Thanks for comments!

arnav-garg1 · 2025-09-20T01:26:32Z

python/triton/_utils.py

    "float8_e4m3fn": "fp8e4nv",
    "float8e4b8": "fp8e4b8",
-    "float8_e4m3fnuz": "fp8e4b8",
+    "float8_e4m3fnuz": "fp8e4m3fnuz",


Agreed—no new backend dtype is introduced here. This PR only exposes a user-facing alias tl.float8e4m3fnuz and keeps canonicalization:

_utils.py

"float8_e4m3fnuz": "fp8e4b8"

The goal is ergonomic parity with PyTorch’s torch.float8_e4m3fnuz so kernels can write:

x.to(tl.float8e4m3fnuz)

without needing to know the backend internal name. All IR paths would still go through the existing fp8e4b8 plumbing so behavior is unchanged.

arnav-garg1 requested a review from ptillet as a code owner September 19, 2025 07:03

arnav-garg1 mentioned this pull request Sep 19, 2025

Add triton dtype for torch.float8_e4m3fnuz to allow explicit casts #8164

Open

ThomasRaoux requested changes Sep 19, 2025

View reviewed changes

Address review: kernel-based tests for tl.float8e4m3fnuz; keep alias …

1e3a52e

…mapping; tidy utils

arnav-garg1 commented Sep 20, 2025

View reviewed changes

Merge branch 'main' into add-fp8-dtype

3007c4c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for fp8e4m3fnuz dtype in Triton #8231

Add support for fp8e4m3fnuz dtype in Triton #8231

arnav-garg1 commented Sep 19, 2025 •

edited

Loading

Uh oh!

ThomasRaoux Sep 19, 2025

Uh oh!

arnav-garg1 Sep 20, 2025

Uh oh!

ThomasRaoux Sep 19, 2025

Uh oh!

arnav-garg1 Sep 20, 2025

Uh oh!

arnav-garg1 left a comment

Uh oh!

arnav-garg1 Sep 20, 2025

Uh oh!

Uh oh!

Add support for fp8e4m3fnuz dtype in Triton #8231

Are you sure you want to change the base?

Add support for fp8e4m3fnuz dtype in Triton #8231

Conversation

arnav-garg1 commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New contributor declaration

Uh oh!

ThomasRaoux Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

arnav-garg1 Sep 20, 2025

Choose a reason for hiding this comment

Uh oh!

ThomasRaoux Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

arnav-garg1 Sep 20, 2025

Choose a reason for hiding this comment

_utils.py

Uh oh!

arnav-garg1 left a comment

Choose a reason for hiding this comment

Uh oh!

arnav-garg1 Sep 20, 2025

Choose a reason for hiding this comment

_utils.py

Uh oh!

Uh oh!

arnav-garg1 commented Sep 19, 2025 •

edited

Loading