Skip to content

Conversation

@vickiw973
Copy link

Tested results on B200 chip with Python3.13.8 and CuTe DSL 4.3.0.dev0

(env13_8) nvfp4_group_gemm$ python3 eval.py test task.yml
main
test-count: 10
test.0.spec: m: [128, 128]; n: [128, 256]; k: [128, 512]; g: 2; seed: 1111
test.0.status: pass
test.1.spec: m: [256, 128]; n: [512, 384]; k: [256, 256]; g: 2; seed: 1111
test.1.status: pass
test.2.spec: m: [128, 128]; n: [128, 256]; k: [128, 512]; g: 2; seed: 1111
test.2.status: pass
test.3.spec: m: [256, 128, 256]; n: [384, 256, 128]; k: [256, 512, 128]; g: 3; seed: 1111
test.3.status: pass
test.4.spec: m: [512, 256, 128]; n: [768, 128, 256]; k: [512, 512, 128]; g: 3; seed: 1111
test.4.status: pass
test.5.spec: m: [128, 768, 512]; n: [128, 384, 512]; k: [384, 512, 128]; g: 3; seed: 1111
test.5.status: pass
test.6.spec: m: [512, 768, 384]; n: [256, 512, 512]; k: [768, 128, 768]; g: 3; seed: 1111
test.6.status: pass
test.7.spec: m: [128, 128, 128, 128]; n: [128, 128, 128, 128]; k: [128, 128, 128, 128]; g: 4; seed: 1111
test.7.status: pass
test.8.spec: m: [256, 128, 384, 512]; n: [512, 384, 256, 128]; k: [256, 256, 256, 256]; g: 4; seed: 1111
test.8.status: pass
test.9.spec: m: [512, 384, 256, 128]; n: [256, 256, 256, 256]; k: [512, 128, 512, 128]; g: 4; seed: 1111
test.9.status: pass
check: pass
main end
(env13_8) nvfp4_group_gemm$ python3 eval.py benchmark task.yml
main
benchmark-count: 4
benchmark.0.spec: m: [128, 128, 128, 128, 128, 128, 128, 128]; n: [2048, 6144, 2048, 5120, 2048, 7168, 3072, 5120]; k: [7168, 7168, 7168, 7168, 7168, 7168, 7168, 7168]; g: 8; seed: 1111
benchmark.0.runs: 100
benchmark.0.mean: 357707.5186371803
benchmark.0.std: 71426.6155640933
benchmark.0.err: 7142.66155640933
benchmark.0.best: 328736.00721359253
benchmark.0.worst: 769056.0221672058
benchmark.1.spec: m: [128, 128, 128, 128, 128, 128, 128, 128]; n: [6144, 8192, 5120, 8192, 7168, 7168, 8192, 7168]; k: [2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048]; g: 8; seed: 1111
benchmark.1.runs: 100
benchmark.1.mean: 321697.92115688324
benchmark.1.std: 68806.66814911975
benchmark.1.err: 6880.666814911975
benchmark.1.best: 295967.9961204529
benchmark.1.worst: 698400.0205993652
benchmark.2.spec: m: [256, 256]; n: [3072, 3072]; k: [4096, 4096]; g: 2; seed: 1111
benchmark.2.runs: 100
benchmark.2.mean: 165156.15984797478
benchmark.2.std: 39791.514431117816
benchmark.2.err: 3979.1514431117816
benchmark.2.best: 151552.0066022873
benchmark.2.worst: 546688.0202293396
benchmark.3.spec: m: [128, 384]; n: [4096, 4096]; k: [1536, 1536]; g: 2; seed: 1111
benchmark.3.runs: 100
benchmark.3.mean: 158621.1197078228
benchmark.3.std: 45832.550645002164
benchmark.3.err: 4583.255064500217
benchmark.3.best: 143071.9941854477
benchmark.3.worst: 489407.9864025116
check: pass
main end
(env13_8) nvfp4_group_gemm$ python3 eval.py leaderboard task.yml
main
benchmark-count: 4
benchmark.0.spec: m: [128, 128, 128, 128, 128, 128, 128, 128]; n: [2048, 6144, 2048, 5120, 2048, 7168, 3072, 5120]; k: [7168, 7168, 7168, 7168, 7168, 7168, 7168, 7168]; g: 8; seed: 1111
benchmark.0.runs: 100
benchmark.0.mean: 557592.6411151886
benchmark.0.std: 306415.5682646463
benchmark.0.err: 30641.556826464628
benchmark.0.best: 482336.01450920105
benchmark.0.worst: 3452960.0143432617
benchmark.1.spec: m: [128, 128, 128, 128, 128, 128, 128, 128]; n: [6144, 8192, 5120, 8192, 7168, 7168, 8192, 7168]; k: [2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048]; g: 8; seed: 1111
benchmark.1.runs: 100
benchmark.1.mean: 458914.5600795746
benchmark.1.std: 105335.3649110644
benchmark.1.err: 10533.53649110644
benchmark.1.best: 416832.00001716614
benchmark.1.worst: 878624.0220069885
benchmark.2.spec: m: [256, 256]; n: [3072, 3072]; k: [4096, 4096]; g: 2; seed: 1111
benchmark.2.runs: 100
benchmark.2.mean: 214728.64016890526
benchmark.2.std: 51155.40134188374
benchmark.2.err: 5115.540134188374
benchmark.2.best: 198592.00716018677
benchmark.2.worst: 717055.9763908386
benchmark.3.spec: m: [128, 384]; n: [4096, 4096]; k: [1536, 1536]; g: 2; seed: 1111
benchmark.3.runs: 100
benchmark.3.mean: 213030.39968013763
benchmark.3.std: 63644.099319036424
benchmark.3.err: 6364.409931903642
benchmark.3.best: 195904.00159358978
benchmark.3.worst: 662688.0168914795
check: pass
main end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant