Fix/predictable temp file and race condition in gpu utils vulnerability by AseemPrasad · Pull Request #2593 · confident-ai/deepeval

AseemPrasad · 2026-04-02T14:02:48Z

The vulnerability i found was around Predictable-temp-file-and-race-condition-in-GPU-utils

There are two compounding problems:

A) Predictable, unsanitized temp file in the current working directory. Any process or user with write access to the CWD can create tmp_smi before os.system runs. The shell redirection >tmp_smi will overwrite an existing file, but if an attacker creates a symlink at tmp_smi → /etc/cron.d/backdoor before the call, os.system will write GPU output into that target. On many Linux systems, cron files with any content execute. Alternatively, an attacker can poison tmp_smi after the os.system write but before open("tmp_smi") reads it, injecting crafted integers that cause downstream np.argmax to select a specific GPU or crash the process.

B) Both functions use the same filename with no locking. If both are called concurrently (common in a multi-process test environment), one process's os.remove can delete the file while the other is reading it, or each process can overwrite the other's output, causing incorrect GPU selection — which can silently corrupt model loading.

Impact in context: Deepeval is commonly run as part of CI/CD pipelines with elevated permissions. Attackers who can write to the working directory (e.g., via a separate compromised build step) can leverage this for symlink-based privilege escalation or data injection.

here is what i did,

I replaced the GPU temp-file path in [utils.py:667] with a shared helper that calls nvidia-smi through subprocess.run, captures stdout directly, and parses the free-memory values in memory. That removes tmp_smi entirely, so there is no symlink target, no time-of-check/time-of-use window, and no shared filename for concurrent callers to collide on. The two public callers now both use the same in-memory helper, so get_freer_gpu and any_gpu_with_space behave independently and safely even when invoked at the same time.

I also hardened the failure path: the helper now times out after 30 seconds and raises a clear RuntimeError if nvidia-smi is missing, hangs, or exits non-zero. That means the caller gets an explicit failure instead of silent corruption or a hung process. I validated the edited file with the workspace error checker, and it reports no syntax issues.

vercel · 2026-04-02T14:02:54Z

@AseemPrasad is attempting to deploy a commit to the Confident AI Team on Vercel.

A member of the Team first needs to authorize it.

AseemPrasad added 2 commits April 2, 2026 18:50

fixing secure_exec sandbox escape via getattr vulnerability

fe4f263

fixing Predictable temp file + race condition in GPU utils vulnerability

d2fe5bf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/predictable temp file and race condition in gpu utils vulnerability#2593

Fix/predictable temp file and race condition in gpu utils vulnerability#2593
AseemPrasad wants to merge 2 commits intoconfident-ai:mainfrom
AseemPrasad:fix/Predictable-temp-file-and-race-condition-in-GPU-utils-vulnerability

AseemPrasad commented Apr 2, 2026

Uh oh!

vercel bot commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AseemPrasad commented Apr 2, 2026

Uh oh!

vercel bot commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant