GPU 0 gets duplicate processes from processes running on other GPUs

I am training different models on different GPUs simultaneously in one of my remote machines, and I found that the processes running on GPUs whose id isn’t 0 are somehow duplicated on GPU whose id is 0, as shown below:

As you can see from the PIDs, the processes running on GPU_i with i nonzero are also duplicated on GPU_0. I am running the exact same code in each of these GPUs (with different hyperparameters). This is really puzzling because in another remote machine with very similar settings, I could not reproduce this behavior even though I ran exactly the same code in each GPU in exactly the same way. In that machine, there was no duplication of processes to GPU_0, only one process in each GPU.

So, I concluded that this is likely not due to a bug in my code, but due to external settings like Pytorch, cuda, or driver version. The other machine I tried to reproduce this behavior has the exactly the same version of driver (387.26) and same number of the same GPU (Titan-XP), but they have somewhat different versions of pytorch and cuda. Shown below is the exact versions of the packages installed (all installed by conda) in each machine (the first one is from the machine where I had this problem, and the second from the other machine):
(The problematic machine)

(The other machine)

As you can see, their Pytorch and cuda versions are different. Can this be the cause of this weird behavior?
(By the way, not sure if this is relevant information, but the cuda version I get from nvcc --version in both machines is V7.5.17, possibly due to remnants of the previous installations)

How did you run the different processes?
Did you use CUDA_VISIBLE_DEVICES=ID in your environment or did you push the tensors to the appropriate GPU in the different codes?

If you’ve used the second approach, PyTorch still sees all GPUs and uses GPU0 by default to initialize CUDA as far as I know.

Say my code is I create a tmux session and 8 panes in it. In i-th panel, I run 'python --gpu_idx i" where i starts from 0 to 7 and --gpu_idx indicates the device id of the gpu I want to use. At the very beginning of the code there is torch.cuda.set_device(args.gpu_idx) to set the gpu to use. Then, I apply .cuda() to the network and loss function, and in the training loop, I wrap the data and labels tensors as Variables, and do data = data.cuda() and labels = labels.cuda(), in a very standard way. I never used CUDA_VISIBLE_DEVICES=ID. So, I think I used the second approach you are mentioning. But, what is really strange is that in the other machine, the very same code works just as expected, meaning that running --gpu_idx i only creates a process in the corresponding gpu, and not in GPU0.

Ok, I see.
Could you try to use


and see if this problem still occurs.

Also, since you are using 0.3.x, I would suggest to update to the latest stable release 0.4.0.
It has some nice features and bug fixes. Have a look at the Migration Guide.

Unfortunately, I cannot test your script on my machine at the moment, since it’s busy.

I think there was a bug in pytorch 0.3 that was fixed in 0.3.1 where if the dataloader was using pinned memory it created context on a 0-th gpu even if 0-th gpu was not used. Probably that’s what you are seeing. As @ptrblck is suggesting, try updating to pytorch 0.4, this is the latest stable version.

Based on the fact that the exact same code doesn’t have such a problem in pytorch 0.3.1. and I indeed used pinned memory, I think ngimel’s answer is probably right, although I haven’t tried to install 0.3.1 and revisit the problem. I will probably install 0.4 later. Thank you so much guys!

For future reference, I confirm that upgrading to 0.3.1 solved the problem.

I am using PyTorch v1.0.0 and seem to be running into the same issue as described in the original post. Duplicate processes are started on GPU 0 from processes running on the other GPUs. See screenshot for an example.

Hi, I am facing the same issue. Did you solve this problem?

Hi @ptrblck, I’m still facing this issue now with DDP being set up by default. Any update on what might have caused this?

The original issue was created for PyTorch 0.3, so let’s focus on your new problem instead. :wink:

Are you using any device-specific operations such as empty_cache without specifying the device id?
This could create a new CUDA context on the default device from another process, if you haven’t masked them via CUDA_VISIBLE_DEVICES.
If that’s not the case, could you post your code, so that we could have a look?

Ooh, there is a bunch of those, so it took me a few days to chase it out. It was indeed a device-specific op without specifying the ID (or rather, I parsed the args afterwards :man_facepalming:)!