Skip to content

H200 Cross Entropy failed when online softmax is used #69

@Jokeren

Description

@Jokeren

python benchmark_cross_entropy.py --M 32768 --N 4096

  File "/mnt/data/keren/code/quack/quack/cross_entropy.py", line 312, in cross_entropy_fwd_out
    cross_entropy_fwd_out.compile_cache[compile_key] = cute.compile(
                                                       ^^^^^^^^^^^^^
  File "/mnt/data/keren/code/quack/quack/cross_entropy.py", line 92, in __call__
    ).launch(
  ^^^^^^^^^^^
  File "/mnt/data/keren/code/quack/quack/cross_entropy.py", line 171, in kernel
    max_x = row_reduce(
        ^^^^^^^^^^^^^^^
  File "/mnt/data/keren/code/quack/quack/reduce.py", line 120, in row_reduce
    val = block_or_cluster_reduce(
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/keren/code/quack/quack/reduce.py", line 80, in block_or_cluster_reduce
    return block_reduce(val, op, reduction_buffer, init_val=init_val)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/keren/code/quack/quack/reduce.py", line 22, in block_reduce
    if lane_idx == 0:
  File "/mnt/data/keren/code/quack/quack/reduce.py", line 23, in then_block_1
    reduction_buffer[row_idx, col_idx] = val
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
  File "/mnt/data/keren/envs/triton/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/cute/tensor.py", line 279, in _cvt_to_dest
    raise ValueError(
ValueError: Type mismatch, store Float32 (-> Float32) to Tensor with element type Int64

cc @tridao @lezcano

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions