-
Notifications
You must be signed in to change notification settings - Fork 51
Standalone score compute #433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
8bd36f5
to
f5cffb2
Compare
f5cffb2
to
b917d71
Compare
I tested it locally on a few results With scaling.json -
Without
After manually deleting scaling.json -
But if I don't manually delete scaling.json and run it without the |
Testing power scores - With
Without
After deleting
|
Few more issues that need to be dealt with - Trying to compute scores or just 1 or 2 files returns None although it would help to just print out the individual scores of each of the files in the folder -
Changing file names to be anything other than result_*.txt does not compute scores although this is expected.
|
@pgmpablo157321 TODO items as discussed in the training WG
Additional piece for (2) would be to also print the samples to converge along with the scores for each log file so submitters get a sense of their convergence as well. Is that also something we can add? |
description="Compute the score of a single benchmark", | ||
) | ||
parser.add_argument( | ||
"--benchmark", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be unnecessary to specify benchmark from the command line. Rather this information should be gotten from the MLLOG submission_benchmark
log line. (And additionally, it should be checked that the submission_benchmark
log line of every log is the same.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this standalone script to do everything that the compliance checker does. And I don't think passing --benchmark is much of a hassle either so it should be easier for submitters to do that and for the script to rely on that information from users, than failing if say the MLLOG lines are not correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This command line flag adds absolutely no value, and it is a hassle to figure out the canonical name of some benchmarks.
In some cases (where all we want to know is the raw score and/or the number of samples to converge) we don't need to know the benchmark name at all.
When we do need to know the benchmark name (to calculate the number of logs for olympic scoring and for rcp scaling), the command line flag is redundant (because the information is in the log files). If you don't want to check that the same benchmark name is in all the files that's fine. Just get the benchmark name out of the first file then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is a hassle to figure out the canonical name of some benchmarks.
That's fair.
If you don't want to check that the same benchmark name is in all the files that's fine. Just get the benchmark name out of the first file then.
Yeah I think this should work. @pgmpablo157321 ?
"--has_power", action="store_true", help="Compute power score as well" | ||
) | ||
parser.add_argument( | ||
"--benchmark_folder", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend taking a list of files rather than a folder name. then the user could specify the list of files as folder/result*.txt
to get all the result.txt files in a folder, but could also specify a single file, and could specify log files and directories that are named differently than result*.txt
, like foo/bar/baz/*.log
|
||
|
||
**BENCHMARK:** Name of the benchmark to compute the score such as rgat, llama31_8b, etc. | ||
**SYSTEM_NAME:** The name of the system, it can be set to None. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recommend changing
The name of the system, it can be set to None
to
Optional system name
Fix #419