-
Notifications
You must be signed in to change notification settings - Fork 53
Description
Describe the bug
When an OpenFold3 job is run using SLURM, the SLURM job ends up spawning an excessive number of processes (likely due to the SLURM configuration). It would be preferable to disable the SLURMEnvironment and only use local resources. In my case, the job has been submitted to a VM with finite resources and the OpenFold3 job oversubscribes the CPU/GPU leading to the job appearing to hang on the cluster.
To Reproduce
Reproducing the issue may be difficult if you don't have a compute cluster configured in a similar way; however, the SLURM autodetection is a common problem with Pytorch Lightning. Note that my OpenFold3 job is not run in Docker. While that might be a solution for me in the long term, it would be nice if we could prevent SLURM autodetection as command-line argument.
Expected behavior
To avoid this issue, the run command could accept one or more arguments to disable SLURM autodetection or to manually specify the local resources to be used.