How to make torch.cuda.device context managers compatible with cpu

I’ve been working on a machine with multiple GPUs, and have needed to specific which GPU to run on. The docs say that best practice is to use a torch.cuda.device context manager, so the following line appears at multiple points in my code:

with torch.cuda.device(f"cuda:{ARGS.gpu}"):
    <do-tensor-stuff>

Now I want to make this code able to run on a CPU. Understandably, this line is now giving the error

AssertionError: 
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx

I can’t think of a neat solution. Obviously the following is not good because there would be so much duplicate code.

if torch.cuda.is_available():
    with torch.cuda.device(f"cuda:{ARGS.gpu}"):
        <do-tensor-stuff>
else:
    <do-tensor-stuff>
1 Like

Alternatively to the context manager you could also create a device variable as explained in the device-agnostic code part:

args = parser.parse_args()
args.device = None
if not args.disable_cuda and torch.cuda.is_available():
    args.device = torch.device(f"cuda:{ARGS.gpu}")
else:
    args.device = torch.device('cpu')

Afterwards just pass the args.device to each to() operation.

Thanks. Yes this works, but what would be ideal is if the context manager could take a cpu device as an argument. Then we could avoid having to call .to(device) on every new tensor.

The current context manager isn’t working like that for CUDA tensors either, is it?
You would still need to call to(), device= or cuda(). It will just change the default device:

with torch.cuda.device(1):
    # allocates a tensor on GPU 1
    a = torch.tensor([1., 2.], device=cuda)

    # transfers a tensor from CPU to GPU 1
    b = torch.tensor([1., 2.]).cuda()
    # a.device and b.device are device(type='cuda', index=1)

    # You can also use ``Tensor.to`` to transfer a tensor:
    b2 = torch.tensor([1., 2.]).to(device=cuda)
    # b.device and b2.device are device(type='cuda', index=1)

Yes that’s true. Actually, the case I have in mind involves tensor-creating functions.

I used to pass a device argument to all such functions, which became cumbersome when there were a lot and they were nested and things.

def make_tensor_device(device): 
    return torch.tensor([1.,2.]).to(device)
   
device = torch.device(f"cuda:{ARGS.device}")
t = make_tensor_device(device)

The context manager simplified this because I can just set the default gpu and then call .cuda() on created tensors.

def make_tensor():
    return torch.tensor([1.,2.]).cuda() 

device = torch.device(f"cuda:{ARGS.device}")  
with torch.cuda.device(device):
    t = make_tensor()

Now it seems I have to go back to using the device argument. Not the end of the world, but it would be nice to allow setting the default device to either the cpu or a specific gpu, maybe that would even mean that .cuda() would no longer be necessary.

Well, there is the option of setting the default tensor type via:

torch.set_default_tensor_type(torch.cuda.FloatTensor)

but I would recommend avoiding it, as this will create all tensors on the device, which are not necessarily needed on the GPU and you might run in a lot of issues.

I understand that the device argument might be a bit cumbersome, but I personally think it’s the cleanest way of writing the code.

However, any suggestions on improving the user experience are more then welcome.
As already said, I’m not a huge fan of changing the default dtype, but how would a “good” API look like for your use case?
Could you post a pseudo code for it?

I was hoping to be able to do something like:

def make_tensor():
    return torch.tensor([1.,2.])

if not ARGS.disable_cuda and torch.cuda.is_available():
    device = torch.device(f"cuda:{ARGS.gpu}")
else:
    device = torch.device('cpu')

with torch.device(device):
    t = make_tensor() 

and have this return a tensor on either the cpu or the specified gpu. However, this might also have the problem of putting lots of unnecessary things on the gpu. I didn’t know about the option to set the default type, and if you think that’s not a good idea, then maybe what I’m suggesting isn’t either.

Hmm maybe it would work if .cuda() just did nothing when the user had set the device to ‘cpu’. The following kind of works to give what I’m after.

from contextlib import contextmanager                                           
import torch                                                                          

torch.manually_set_device='none'                                                   
                                                                                     
@contextmanager                                                                    
def set_torch_device(new_device):                                                  
    prev_device = torch.manually_set_device                                        
    torch.manually_set_device = new_device                                         
    yield                                                                          
    torch.manually_set_device = prev_device                                        
                                                                                     
def maybe_cuda(t):                                                                 
    if torch.manually_set_device == 'none':                                        
        return t.cuda()                                                            
    else:                                                                          
        return t.to(torch.manually_set_device)                                     
                                                                                     
def make_tensor():                                                                 
    return maybe_cuda(torch.Tensor([1.,2.]))                                       
  
device = torch.device('cpu')                                                    
with set_torch_device(device):                                                                                                                                                                    
    t = make_tensor()                                                              
    print(t.device)                                                                
                                                                                     
t = make_tensor()                                                                  
print(t.device)       

Note that for all tensor creation functions without explicit values (empty,zero,full,randn etc.) host-device transfer (.cuda()) is much slower than device=X argument to initial function. Also, I think you can change manually_set_device from ‘none’ to None and avoid if-else checks. I.e.

cur_device = None
#@contextmanager doing DI
...
def make_tensor(shape):
  return torch.zeros(shape, device=cur_device)