Multiple 400MB processes on single GPU #20114
              
                Unanswered
              
          
                  
                    
                      changspencer
                    
                  
                
                  asked this question in
                DDP / multi-GPU / multi-node
              
            Replies: 0 comments
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
-
Hello everyone!
I had a question for something I've been wondering about for DP/DDP behavior. Occasionally, for my runs on a SLURM cluster, I see multiple small 400 MB processes get placed on a single GPU. I assume this GPU is something like the "master" process, but I've no idea why the small processes are necessary or if they don't get cleaned up after a method finishes running.
What could be the reason I see the processes show up on a single GPU during training (although all training processes have been started)? Could this be a SLURM resource management problem or a (personal) programming problem?
Some quick notes:
I can try to provide more details, but I wanted to see if anyone else has experienced the same situation for - possibly - different use cases.
Beta Was this translation helpful? Give feedback.
All reactions