Each GPU gets some processes, occupy 12MB memory

I use pytorch set cuda device to select the gpu I want to work on.

But the run will allocate processes on each GPU. For the GPU I selected, the data will be loaded there, uses as much memory as the training required. For those GPUs I didn’t select, each of them will occupy 12 MB, and have processes running, all processes have the same PID.

CUDA_VISIBLE_DEVICES=3 can’t solve my problem, because when I do this I will get invalid device ordinal error, meaning my training script is somehow doing some mysterious jobs as long as I start training, that it will “duplicate” the processes on each GPUs, and therefore all GPUs have to be visible.

Anyone can help this? It is quite embarrassing to let everyone on the server can see my jobs is everywhere. Thank you for your help.

If CUDA_VISIBLE_DEVICES=3 yields an error, you would have to check your code, since you are indeed using different devices.
Check the code for all .cuda(), to() and device usages and use the default device instead of different ones.

All of them are using the device I specified, I do this by using grep -r .cuda() , to() and device

And it is not only about the default device, we have 9 gpus here, all of them get duplicate processes.

Each of them will occupy 12 MB, all processes have the same PID.

You could try to update to the latest nightly and recheck it.
Otherwise, I would recommend to use CUDA_VISIBLE_DEVICES to mask all unwanted devices, which will make sure to not use any memory on the masked devices.