Skip to content

Conversation

@vickiw973
Copy link

@vickiw973 vickiw973 commented Nov 11, 2025

Tested results on B200 chip with Python3.13.8 and CuTe DSL 4.3.0.dev0

(env13_8) nvfp4_gemv$ python eval.py test task.yml
compile: start
compile: pass
test-count: 10
test.0.spec: m: 128; k: 256; l: 1; seed: 1111
test.0.status: pass
test.1.spec: m: 128; k: 1536; l: 1; seed: 1111
test.1.status: pass
test.2.spec: m: 128; k: 3072; l: 1; seed: 1111
test.2.status: pass
test.3.spec: m: 256; k: 7168; l: 1; seed: 1111
test.3.status: pass
test.4.spec: m: 256; k: 7168; l: 1; seed: 1111
test.4.status: pass
test.5.spec: m: 2432; k: 4608; l: 2; seed: 1111
test.5.status: pass
test.6.spec: m: 384; k: 7168; l: 2; seed: 1111
test.6.status: pass
test.7.spec: m: 512; k: 512; l: 2; seed: 1111
test.7.status: pass
test.8.spec: m: 512; k: 4096; l: 2; seed: 1111
test.8.status: pass
test.9.spec: m: 512; k: 1536; l: 2; seed: 1111
test.9.status: pass
check: pass
(env13_8) nvfp4_gemv$ python eval.py benchmark task.yml
compile: start
compile: pass
benchmark-count: 3
benchmark.0.spec: m: 7168; k: 16384; l: 1; seed: 1111
benchmark.0.runs: 200
benchmark.0.mean: 265696.642100811
benchmark.0.std: 30025.3961954022
benchmark.0.err: 2123.116125758166
benchmark.0.best: 257055.99784851074
benchmark.0.worst: 575519.9790000916
benchmark.1.spec: m: 4096; k: 7168; l: 8; seed: 1111
benchmark.1.runs: 200
benchmark.1.mean: 147859.5208376646
benchmark.1.std: 22609.551535848128
benchmark.1.err: 1598.736721058493
benchmark.1.best: 141343.99592876434
benchmark.1.worst: 442400.0084400177
benchmark.2.spec: m: 7168; k: 2048; l: 4; seed: 1111
benchmark.2.runs: 200
benchmark.2.mean: 64546.8794554472
benchmark.2.std: 20387.0100277479
benchmark.2.err: 1441.5793038738684
benchmark.2.best: 58400.001376867294
benchmark.2.worst: 345120.01276016235
check: pass
(env13_8) nvfp4_gemv$ python eval.py leaderboard task.yml
compile: start
compile: pass
benchmark-count: 3
benchmark.0.spec: m: 7168; k: 16384; l: 1; seed: 1111
benchmark.0.runs: 200
benchmark.0.mean: 343663.3589863777
benchmark.0.std: 25337.984040270996
benchmark.0.err: 1791.6660336472137
benchmark.0.best: 320544.0044403076
benchmark.0.worst: 654367.983341217
benchmark.1.spec: m: 4096; k: 7168; l: 8; seed: 1111
benchmark.1.runs: 200
benchmark.1.mean: 241917.76014864445
benchmark.1.std: 9334.827124131922
benchmark.1.err: 660.0719560677799
benchmark.1.best: 217184.00716781616
benchmark.1.worst: 262176.0070323944
benchmark.2.spec: m: 7168; k: 2048; l: 4; seed: 1111
benchmark.2.runs: 200
benchmark.2.mean: 106229.11866754293
benchmark.2.std: 5574.229429629126
benchmark.2.err: 394.15754295803754
benchmark.2.best: 99327.99637317657
benchmark.2.worst: 138239.9946451187
check: pass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant