What is the difference between doing `net.cuda()` vs `net.to(device)`?

I was going through this post ([SOLVED] Make Sure That Pytorch Using GPU To Compute) and I had the question, what is the difference between these two pieces of code?

import torch.nn as nn
net = nn.Sequential(OrderedDict( [ ('fc1',nn.Linear(3,1)) ]) )
net.cuda()

vs

import torch
import torch.nn as nn

use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")

net = nn.Sequential( OrderedDict([ ('fc1', nn.Linear(3,1)) ]) )
net.to(device)

vs

import torch
import torch.nn as nn

use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")

net = nn.Sequential( OrderedDict([ ('fc1', nn.Linear(3,1)) ]) )
net = net.to(device)

which one is the recommended one? Which one is the one that is hardware agnostic (i.e. no matter type of gpu or even cpu).

is there some sort of internal flag I can check to see if things are properly placed in GPU?

4 Likes

cuda() and to('cuda') are going to do the same thing, but the later is more flexible. As you can see in your example code, you can specify a device that might be ‘cpu’ if cuda is unavailable.

If you attempt to call cuda() on a system that doesn’t have a GPU, you’ll get:
AssertionError: Torch not compiled with CUDA enabled.

With the explicit call, you can also use multiple cuda devices – e.g. to('cuda:0') is different from to('cuda:1'). The simpler cuda() call will just use the default cuda device.

12 Likes

what about:

import torch
import torch.nn as nn

use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")

net = nn.Sequential( OrderedDict([ ('fc1', nn.Linear(3,1)) ]) )
net = net.to(device)

vs

import torch
import torch.nn as nn

use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")

net = nn.Sequential( OrderedDict([ ('fc1', nn.Linear(3,1)) ]) )
net.to(device)

?

Whats the difference? Which one do I choose?

1 Like

Your second variant is copying the net to the device, but not assigning the copy to anything. to(device) is not an in-place operation, so this effectively doesn’t do anything.

2 Likes

Not sure if thats correct. It seems that there are no errors thrown (if I have data not in cuda/gpu the type of the tensor won’t match the net so an error would be thrown). So it seems that net.to(device) mutates my net and must put it in gpu…otherwise I would have seen an error thrown.

I wonder if this a unintended bug in pytorch and where they meant net = net.to(device) to be the only thing to work.

ah, you’re correct. I’m used to tensor to which is out of place. It might be better habit to re-assign, as I have seen a number of mistakes where a tensor.to(device) was assumed to be in-place.

In [14]: net=torch.nn.Linear(3,4)
In [15]: net.weight.device
Out[15]: device(type='cpu')
In [16]: net.to('cuda')
Out[16]: Linear(in_features=3, out_features=4, bias=True)
In [17]: net.weight.device
Out[17]: device(type='cuda', index=0)
2 Likes

what does the in-placeness of .to(device) have to do with this?

I’m confused.

1 Like

If you call .to() on a tensor, the operation will not be performed in-place, thus you need to assign a new variable to the it.
On the module however, all parameters will be pushed internally to the specified device, so you don’t need to assign the model back to the call.

However, as @nairbv mentioned, it might be a good habit just to use the assignment by default to avoid possible errors. :wink:

10 Likes

This is what I’m trying to understand. I always expected .to(device) to work the way it does. I don’t understand why I need to re-assign. I always expected .to(device) to mutate things.

It does not work in-place on tensors, so you have to reassign the result:

x = torch.randn(1, device='cpu')
print(x.device)
> cpu

x.to('cuda')
print(x.device)
> cpu

x = x.to('cuda')
print(x.device)
> cuda:0
13 Likes

@ptrblck You have been so patient in explaining this :smiley: :joy:

@ptrblck ,
Do you mean that tensor.to() is not in-place, and tensor.cuda() is in-place?
that is the major difference with to() and cuda()?

No, neither are in-place on a tensor.
Calling to() or cuda() on an nn.Module object will internally move all parameters and buffers to the device, dtype, or memory format so you wouldn’t need to reassign a model:

model.to('cuda') # works
tensor.to('cuda') # tensor is still on the original device afterwards, as the CUDATensor wasn't assigned
4 Likes

@ptrblck ,
Thank you!

there is no difference between to() and cuda().
there is difference when we use to() and cuda() between Module and tensor:
on Module(i.e. network), Module will be moved to destination device,
on tensor, it will still be on original device. the returned tensor will be move to destination device

right?

Yes, almost. The first point is right, in case you are only concerned about the device, but to() can also change the dtype, memory-layout etc. so has more functionality that cuda()/cpu().

1 Like

@ptrblck ,
Thank you!

if we only consider device, there is no difference between to() and cuda().
if we consider other functionalities, to() has much more functionalities than cuda()

@ptrblck ,

One more questions:

# code 1:
device=torch.device('cuda')
net.to(device)

# code 2:
device=torch.device('cuda:0') # 'cuda:1'
net.to(device)

what is difference between ‘cuda’ and ‘cuda:0’ or ‘cuda:1’? what case should I use them in?

torch.device('cuda') (or just the 'cuda' string) will use the default device, while torch.device('cuda:1') (or the cuda:1 string) will explicitly use GPU1.
The CUDA semantics docs explain this behavior with some examples:

cuda = torch.device('cuda')     # Default CUDA device
cuda0 = torch.device('cuda:0')
cuda2 = torch.device('cuda:2')  # GPU 2 (these are 0-indexed)

x = torch.tensor([1., 2.], device=cuda0)
# x.device is device(type='cuda', index=0)
y = torch.tensor([1., 2.]).cuda()
# y.device is device(type='cuda', index=0)

with torch.cuda.device(1):
    # allocates a tensor on GPU 1
    a = torch.tensor([1., 2.], device=cuda)

    # transfers a tensor from CPU to GPU 1
    b = torch.tensor([1., 2.]).cuda()
    # a.device and b.device are device(type='cuda', index=1)

    # You can also use ``Tensor.to`` to transfer a tensor:
    b2 = torch.tensor([1., 2.]).to(device=cuda)
    # b.device and b2.device are device(type='cuda', index=1)

    c = a + b
    # c.device is device(type='cuda', index=1)

    z = x + y
    # z.device is device(type='cuda', index=0)

    # even within a context, you can specify the device
    # (or give a GPU index to the .cuda call)
    d = torch.randn(2, device=cuda2)
    e = torch.randn(2).to(cuda2)
    f = torch.randn(2).cuda(cuda2)
    # d.device, e.device, and f.device are all device(type='cuda', index=2)
1 Like