Is there any difference between x.to('cuda') vs x.cuda()? Which one should I use?

Is there any difference between x.to(‘cuda’) vs x.cuda()? Which one should I use? Documentation seems to suggest to use x.to(‘cuda’).

4 Likes

I’m quite new to PyTorch, so there may be more to it than this, but I think that one advantage of using x.to(device) is that you can do something like this:

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
x = x.to(device)

Then if you’re running your code on a different machine that doesn’t have a GPU, you won’t need to make any changes. If you explicitly do x = x.cuda() or even x = x.to('cuda') then you’ll have to make changes for CPU-only machines.

5 Likes

.cuda()/.cpu() is the old, pre-0.4 way. As of 0.4, it is recommended to use .to(device) because it is more flexible, as neighthan showed above.

5 Likes

Hi,

I am seeing a big timing difference in the following piece of code:

import torch
from datetime import datetime
a = torch.rand(20000,20000)
a = a.cuda()

#a = torch.rand(20000,20000)
#a.to('cuda')

i=0
t1 = datetime.now()
while i< 500:
    a += 1
    a -= 1
    i+=1
t2 = datetime.now()
print('cuda', t2-t1)

This code will take 1 min.

Using

a = torch.rand(20000,20000)
a.to('cuda')

instead takes only ~1 second.

Am I getting something wrong?

I am using torch 1.0+

1 Like

CUDA operations are asynchronous, so you would have to synchronize all CUDA ops before starting and stopping the timer:

a = torch.rand(20000,20000)
a = a.cuda()

i=0
torch.cuda.synchronize()
t1 = time.time()
while i< 500:
    a += 1
    a -= 1
    i+=1
torch.cuda.synchronize()
t2 = time.time()
print('cuda', t2-t1)

a = torch.rand(20000,20000)
a = a.to('cuda')


i=0
torch.cuda.synchronize()
t1 = time.time()
while i< 500:
    a += 1
    a -= 1
    i+=1
torch.cuda.synchronize()
t2 = time.time()
print('cuda string', t2-t1)

> cuda 5.500105619430542
> cuda string 5.479088306427002

Also, it seems you’ve forgotten to reassign a.to('cuda') to a, so that this code will run on the CPU.

4 Likes

Thank you very much for the prompt reply and the good catch to the problem.

1 Like

is this true? what if you are using multiple GPUs? would pytorch allocate the right GPUs automatically without have to specify them if one uses .cuda()? @ptrblck

PS: asked similar question Run Pytorch on Multiple GPUs but didn’t get an answer to that (if .cuda() allocates to the right gpu automatically or if it has to be manually done all the time).

They have the same behavior: to("cuda") and .cuda() when you don’t specify the device, will use device 0.
You can specify the device for both with to("cuda:0") and .cuda(0).

so we have to specify the device? always if we use multiple gpus?

(I am trying to have to modify my code as little as possible but I keep getting issues that tensors are not in the right GPUS…I am testing it with pytorch’s resnet18 )

No you don’t have to specify the device.
As mentioned in my first sentence above, if you don’t specify the device, device 0 will be used.

hmmm then I don’t understand why my model doesn’t work with multiple GPUs.

Do you try to use a GPU that is not the 0th somewhere?
Note that you can use CUDA_VISIBLE_DEVICES=0 to hide all but the 0th GPU and avoid this issue altogether :slight_smile: