-
Notifications
You must be signed in to change notification settings - Fork 15
Profiling Infrastructure #354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This de-duplicates some duplicated code paths. This makes it easier to patch profiling calls into the function later on.
Coverage reportClick to see where and how coverage changed
This report was generated by python-coverage-comment-action |
f7fc61e
to
b7e8c8c
Compare
f5c02ef
to
e5007f6
Compare
This way we can tell whether we are using CUDA or ROCm later on. This also fixes the ROCm fallback path.
e5007f6
to
f3dd42e
Compare
This will be used to communicate external download links such as profiling results.
A new ProfileResult type is added to run_eval, which is is returned in the EvalResult type. Among other fields, this contains the `download_url` field which should be used by the user to download profiling data. Note that the actual public download link may not be known in run_eval.py. In this case, it is the intention that the launcher fixes up the `download_url` before returning the results back to libkernelbot.
The new function `GitHubRun.get_artifact_index` returns a dict of artifacts available from the run. For each artifact, the GitHub API URL and public download URL are returned. The latter is not available directly from the GitHub API, however, it can be easily constructed from the data that is available in the worflow result. `download_artifacts` is replaced by a function which downloads a specific artifact rather than all of them. Additionally, the function no longer writes to a temp file when downloading the artifact; the results of the download request can be piped directly into zipfile using BytesIO.
The idea is that eval_run.py places profiling data in the profile_data/ directory, which is then automatically exported to the user. This is done by uploading that directory as the 'profile-data' artifact, then fetching its public download link and returning that as the ProfileResult.download_url.
msaroufim
approved these changes
Sep 10, 2025
Current caveats seem fine to me, the retention policy makes it so you don't have too much time to be abusive. Feel free to merge |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR adds some infrastructure for dealing with profiling data. Basically, the idea is that any data which is placed in the
profile_data/
directory in the github runner is exported (whenEvalResult.profile_result
is set) to the user via discord. As discord attachments could not handle the size of download artifacts (typically up to ~35 MB), I've opted to simply present a direct link to the GitHub artifact. To this end, I've modified the GH launcher to provide a sort of 'index' of artifacts, which can then either be downloaded by the bot or presented as download link.I've also fixed some minor bugs in
run_eval.py
related to fetching ROCm system info, as well as added some extra info toSystemInfo
about the runtime (useful elsewhere in the evaluating process when the actual profiling stuff is added).Some caveats:
profile_data/
currently. I think that the only way around that would be to launch the profiling process with higher privileges and to then drop those privileges ineval.py
. Lmk if you guys want that.Extracted from #339