CUDA_VISIBLE_DEVICE not working

I am trying to run my code on GPU 6 of a 10 GPU machine. I have tried the following:

  1. In the terminal: CUDA_VISIBLE_DEVICES=6 python myscript.py
  2. In the code : CUDA_VISIBLE_DEVICES=6 at the top of my script before I import anything else
  3. In the terminal:
export CUDA_VISIBLE_DEVICES=6
python myscript.py
  1. In the code before I import torch:
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="6"

None of these work, my code just defaults to GPU 0.

Really frustrated and confused, please help.

How did you verify that it’s not working? PyTorch will use cuda:0 for the masked GPU in case you were expecting cuda:6 to work.
Check nvidia-smi and you should see that GPU6 is the one being used via cuda:0 in PyTorch after masking it with the env variable.

When I run nvidia-smi I see as most of my GPU memories being free which should not be the case if the code was running on GPU 6.

Found an interesting error. I ran

>>> torch.cuda.device_count()
>>>1
>>> torch.cuda.current_device()
0

so only GPU 0 is being recognized for some reason even though other people are running their code on all other GPUs.

What am I doing wrong?

As mentioned before, CUDA_VISIBLE_DEVICES= will make the specified GPUs available to the system while PyTorch will still name the available GPUs starting from device id 0.
In your use case this would mean that CUDA_VISIBLE_DEVICES=6 will use the GPU with id 6 and PyTorch will see and use it as cuda:0. It will thus be mapped from GPU6 in the system β†’ cuda:0 in PyTorch.
The same applies for any other usage of CUDA_VISIBLE_DEVICES and you can also change the order of GPUs.

1 Like

That makes sense! The particular error I was facing was because the GPU server needed a restart. it is fixed now

Although I have a new error when wrapping my module in nn.DataParallel

module must have its parameters and buffers on device 
cuda:6 (device_ids[0]) but found one of them on device: cuda:0

Not sure why this is happening. My setup is as follows:

class MyModel(nn.Module):
<model description>

model = MyModel()
model = torch.nn.DataParallel(model, device_ids=[6,7]).cuda()

x = torch.randn(4).cuda()
out = model(x)
model = torch.nn.DataParallel(model, device_ids=[6,7]).cuda()

to

model = torch.nn.DataParallel(model, device_ids=[0,1]).cuda()