Enlarge init range. #74
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Tested results on B200 chip with Python3.13.8 and CuTe DSL 4.3.0.dev0
(env13_8) nvfp4_gemv$ python eval.py test task.yml
compile: start
compile: pass
test-count: 10
test.0.spec: m: 128; k: 256; l: 1; seed: 1111
test.0.status: pass
test.1.spec: m: 128; k: 1536; l: 1; seed: 1111
test.1.status: pass
test.2.spec: m: 128; k: 3072; l: 1; seed: 1111
test.2.status: pass
test.3.spec: m: 256; k: 7168; l: 1; seed: 1111
test.3.status: pass
test.4.spec: m: 256; k: 7168; l: 1; seed: 1111
test.4.status: pass
test.5.spec: m: 2432; k: 4608; l: 2; seed: 1111
test.5.status: pass
test.6.spec: m: 384; k: 7168; l: 2; seed: 1111
test.6.status: pass
test.7.spec: m: 512; k: 512; l: 2; seed: 1111
test.7.status: pass
test.8.spec: m: 512; k: 4096; l: 2; seed: 1111
test.8.status: pass
test.9.spec: m: 512; k: 1536; l: 2; seed: 1111
test.9.status: pass
check: pass
(env13_8) nvfp4_gemv$ python eval.py benchmark task.yml
compile: start
compile: pass
benchmark-count: 3
benchmark.0.spec: m: 7168; k: 16384; l: 1; seed: 1111
benchmark.0.runs: 200
benchmark.0.mean: 265696.642100811
benchmark.0.std: 30025.3961954022
benchmark.0.err: 2123.116125758166
benchmark.0.best: 257055.99784851074
benchmark.0.worst: 575519.9790000916
benchmark.1.spec: m: 4096; k: 7168; l: 8; seed: 1111
benchmark.1.runs: 200
benchmark.1.mean: 147859.5208376646
benchmark.1.std: 22609.551535848128
benchmark.1.err: 1598.736721058493
benchmark.1.best: 141343.99592876434
benchmark.1.worst: 442400.0084400177
benchmark.2.spec: m: 7168; k: 2048; l: 4; seed: 1111
benchmark.2.runs: 200
benchmark.2.mean: 64546.8794554472
benchmark.2.std: 20387.0100277479
benchmark.2.err: 1441.5793038738684
benchmark.2.best: 58400.001376867294
benchmark.2.worst: 345120.01276016235
check: pass
(env13_8) nvfp4_gemv$ python eval.py leaderboard task.yml
compile: start
compile: pass
benchmark-count: 3
benchmark.0.spec: m: 7168; k: 16384; l: 1; seed: 1111
benchmark.0.runs: 200
benchmark.0.mean: 343663.3589863777
benchmark.0.std: 25337.984040270996
benchmark.0.err: 1791.6660336472137
benchmark.0.best: 320544.0044403076
benchmark.0.worst: 654367.983341217
benchmark.1.spec: m: 4096; k: 7168; l: 8; seed: 1111
benchmark.1.runs: 200
benchmark.1.mean: 241917.76014864445
benchmark.1.std: 9334.827124131922
benchmark.1.err: 660.0719560677799
benchmark.1.best: 217184.00716781616
benchmark.1.worst: 262176.0070323944
benchmark.2.spec: m: 7168; k: 2048; l: 4; seed: 1111
benchmark.2.runs: 200
benchmark.2.mean: 106229.11866754293
benchmark.2.std: 5574.229429629126
benchmark.2.err: 394.15754295803754
benchmark.2.best: 99327.99637317657
benchmark.2.worst: 138239.9946451187
check: pass