I’ve been working on a machine with multiple GPUs, and have needed to specific which GPU to run on. The docs say that best practice is to use a torch.cuda.device
context manager, so the following line appears at multiple points in my code:
with torch.cuda.device(f"cuda:{ARGS.gpu}"):
<do-tensor-stuff>
Now I want to make this code able to run on a CPU. Understandably, this line is now giving the error
AssertionError:
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx
I can’t think of a neat solution. Obviously the following is not good because there would be so much duplicate code.
if torch.cuda.is_available():
with torch.cuda.device(f"cuda:{ARGS.gpu}"):
<do-tensor-stuff>
else:
<do-tensor-stuff>
1 Like
Alternatively to the context manager you could also create a device
variable as explained in the device-agnostic code part:
args = parser.parse_args()
args.device = None
if not args.disable_cuda and torch.cuda.is_available():
args.device = torch.device(f"cuda:{ARGS.gpu}")
else:
args.device = torch.device('cpu')
Afterwards just pass the args.device
to each to()
operation.
Thanks. Yes this works, but what would be ideal is if the context manager could take a cpu device as an argument. Then we could avoid having to call .to(device)
on every new tensor.
The current context manager isn’t working like that for CUDA tensors either, is it?
You would still need to call to()
, device=
or cuda()
. It will just change the default device:
with torch.cuda.device(1):
# allocates a tensor on GPU 1
a = torch.tensor([1., 2.], device=cuda)
# transfers a tensor from CPU to GPU 1
b = torch.tensor([1., 2.]).cuda()
# a.device and b.device are device(type='cuda', index=1)
# You can also use ``Tensor.to`` to transfer a tensor:
b2 = torch.tensor([1., 2.]).to(device=cuda)
# b.device and b2.device are device(type='cuda', index=1)
Yes that’s true. Actually, the case I have in mind involves tensor-creating functions.
I used to pass a device argument to all such functions, which became cumbersome when there were a lot and they were nested and things.
def make_tensor_device(device):
return torch.tensor([1.,2.]).to(device)
device = torch.device(f"cuda:{ARGS.device}")
t = make_tensor_device(device)
The context manager simplified this because I can just set the default gpu and then call .cuda()
on created tensors.
def make_tensor():
return torch.tensor([1.,2.]).cuda()
device = torch.device(f"cuda:{ARGS.device}")
with torch.cuda.device(device):
t = make_tensor()
Now it seems I have to go back to using the device argument. Not the end of the world, but it would be nice to allow setting the default device to either the cpu or a specific gpu, maybe that would even mean that .cuda()
would no longer be necessary.
Well, there is the option of setting the default tensor type via:
torch.set_default_tensor_type(torch.cuda.FloatTensor)
but I would recommend avoiding it, as this will create all tensors on the device, which are not necessarily needed on the GPU and you might run in a lot of issues.
I understand that the device
argument might be a bit cumbersome, but I personally think it’s the cleanest way of writing the code.
However, any suggestions on improving the user experience are more then welcome.
As already said, I’m not a huge fan of changing the default dtype, but how would a “good” API look like for your use case?
Could you post a pseudo code for it?
I was hoping to be able to do something like:
def make_tensor():
return torch.tensor([1.,2.])
if not ARGS.disable_cuda and torch.cuda.is_available():
device = torch.device(f"cuda:{ARGS.gpu}")
else:
device = torch.device('cpu')
with torch.device(device):
t = make_tensor()
and have this return a tensor on either the cpu or the specified gpu. However, this might also have the problem of putting lots of unnecessary things on the gpu. I didn’t know about the option to set the default type, and if you think that’s not a good idea, then maybe what I’m suggesting isn’t either.
Hmm maybe it would work if .cuda()
just did nothing when the user had set the device to ‘cpu’. The following kind of works to give what I’m after.
from contextlib import contextmanager
import torch
torch.manually_set_device='none'
@contextmanager
def set_torch_device(new_device):
prev_device = torch.manually_set_device
torch.manually_set_device = new_device
yield
torch.manually_set_device = prev_device
def maybe_cuda(t):
if torch.manually_set_device == 'none':
return t.cuda()
else:
return t.to(torch.manually_set_device)
def make_tensor():
return maybe_cuda(torch.Tensor([1.,2.]))
device = torch.device('cpu')
with set_torch_device(device):
t = make_tensor()
print(t.device)
t = make_tensor()
print(t.device)
Note that for all tensor creation functions without explicit values (empty,zero,full,randn etc.) host-device transfer (.cuda()) is much slower than device=X argument to initial function. Also, I think you can change manually_set_device from ‘none’ to None and avoid if-else checks. I.e.
cur_device = None
#@contextmanager doing DI
...
def make_tensor(shape):
return torch.zeros(shape, device=cur_device)