Skip to content

Conversation

Snektron
Copy link
Collaborator

@Snektron Snektron commented Sep 9, 2025

Description

This PR adds some infrastructure for dealing with profiling data. Basically, the idea is that any data which is placed in the profile_data/ directory in the github runner is exported (when EvalResult.profile_result is set) to the user via discord. As discord attachments could not handle the size of download artifacts (typically up to ~35 MB), I've opted to simply present a direct link to the GitHub artifact. To this end, I've modified the GH launcher to provide a sort of 'index' of artifacts, which can then either be downloaded by the bot or presented as download link.

I've also fixed some minor bugs in run_eval.py related to fetching ROCm system info, as well as added some extra info to SystemInfo about the runtime (useful elsewhere in the evaluating process when the actual profiling stuff is added).

Some caveats:

  • Users can export any file as download by profiling and writing to profile_data/ currently. I think that the only way around that would be to launch the profiling process with higher privileges and to then drop those privileges in eval.py. Lmk if you guys want that.
  • Profiling artifacts aren't protected, they can be downloaded by anyone (provided that they have a GH account), possibly giving away information about a solution. It shouldn't contain kernel source, just kernel names + run order.

Extracted from #339

This de-duplicates some duplicated code paths. This makes it easier to
patch profiling calls into the function later on.
Copy link

github-actions bot commented Sep 9, 2025

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  src/libkernelbot
  report.py 60
Project Total  

This report was generated by python-coverage-comment-action

@Snektron Snektron marked this pull request as draft September 9, 2025 20:39
This way we can tell whether we are using CUDA or ROCm later on.

This also fixes the ROCm fallback path.
This will be used to communicate external download links
such as profiling results.
A new ProfileResult type is added to run_eval, which is
is returned in the EvalResult type. Among other fields,
this contains the `download_url` field which should be
used by the user to download profiling data. Note that
the actual public download link may not be known in
run_eval.py. In this case, it is the intention that the
launcher fixes up the `download_url` before returning the
results back to libkernelbot.
The new function `GitHubRun.get_artifact_index` returns a
dict of artifacts available from the run. For each artifact,
the GitHub API URL and public download URL are returned.

The latter is not available directly from the GitHub API,
however, it can be easily constructed from the data that is
available in the worflow result.

`download_artifacts` is replaced by a function which downloads
a specific artifact rather than all of them. Additionally, the
function no longer writes to a temp file when downloading the
artifact; the results of the download request can be piped
directly into zipfile using BytesIO.
The idea is that eval_run.py places profiling data in the
profile_data/ directory, which is then automatically exported
to the user. This is done by uploading that directory as the
'profile-data' artifact, then fetching its public download
link and returning that as the ProfileResult.download_url.
@Snektron Snektron marked this pull request as ready for review September 9, 2025 21:02
@Snektron Snektron mentioned this pull request Sep 9, 2025
6 tasks
@Snektron Snektron requested review from ngc92, msaroufim and S1ro1 and removed request for ngc92 and msaroufim September 9, 2025 21:13
@msaroufim
Copy link
Member

Current caveats seem fine to me, the retention policy makes it so you don't have too much time to be abusive. Feel free to merge

@Snektron Snektron merged commit 31a047f into main Sep 10, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants