[KERNELS] Add NCHW BatchNorm forward (fp32 accum) #8305

Aminsed · 2025-09-27T04:34:02Z

Add NCHW BatchNorm forward (two-pass; fp32 accum). API: triton_kernels.batchnorm_forward(...) → (y, mean, var).

Tests: 21 pass vs PyTorch across 2D/4D, fp32/fp16/bf16 (tols fp32 1e-5/1e-6; half 3e-2/3e-3).
Perf (RTX A6000): e.g., (64,128,32,32) fp16 train 0.212 ms vs 0.290 ms (~1.37×). Script: python/triton_kernels/bench/bench_batchnorm.py.
Limits: NCHW-only; no running-stat updates; no backward/fused variants.

Closes #900.

…\n\n- Two-pass kernels: stats (sum/sumsq) + normalize\n- Dtypes: fp32/fp16/bf16; training/eval; PyTorch parity\n- Tests: 21 cases across shapes/dtypes/eps\n- Re-export in triton_kernels.__init__\n\nCloses: triton-lang#900

Aminsed added 2 commits September 27, 2025 00:16

bench(batchnorm): add micro-benchmark and A6000 instructions

658ef89

Aminsed requested a review from ptillet as a code owner September 27, 2025 04:34

Merge branch 'main' into issue-900-batchnorm-forward

ec91223

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[KERNELS] Add NCHW BatchNorm forward (fp32 accum) #8305

[KERNELS] Add NCHW BatchNorm forward (fp32 accum) #8305

Uh oh!

Aminsed commented Sep 27, 2025

Uh oh!

Uh oh!

[KERNELS] Add NCHW BatchNorm forward (fp32 accum) #8305

Are you sure you want to change the base?

[KERNELS] Add NCHW BatchNorm forward (fp32 accum) #8305

Uh oh!

Conversation

Aminsed commented Sep 27, 2025

Uh oh!

Uh oh!