Is there any difference between x.to(‘cuda’) vs x.cuda()? Which one should I use? Documentation seems to suggest to use x.to(‘cuda’).
I’m quite new to PyTorch, so there may be more to it than this, but I think that one advantage of using
x.to(device) is that you can do something like this:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') x = x.to(device)
Then if you’re running your code on a different machine that doesn’t have a GPU, you won’t need to make any changes. If you explicitly do
x = x.cuda() or even
x = x.to('cuda') then you’ll have to make changes for CPU-only machines.
.cuda()/.cpu() is the old, pre-0.4 way. As of 0.4, it is recommended to use .to(device) because it is more flexible, as neighthan showed above.
I am seeing a big timing difference in the following piece of code:
import torch from datetime import datetime a = torch.rand(20000,20000) a = a.cuda() #a = torch.rand(20000,20000) #a.to('cuda') i=0 t1 = datetime.now() while i< 500: a += 1 a -= 1 i+=1 t2 = datetime.now() print('cuda', t2-t1)
This code will take 1 min.
a = torch.rand(20000,20000) a.to('cuda')
instead takes only ~1 second.
Am I getting something wrong?
I am using torch 1.0+
CUDA operations are asynchronous, so you would have to synchronize all CUDA ops before starting and stopping the timer:
a = torch.rand(20000,20000) a = a.cuda() i=0 torch.cuda.synchronize() t1 = time.time() while i< 500: a += 1 a -= 1 i+=1 torch.cuda.synchronize() t2 = time.time() print('cuda', t2-t1) a = torch.rand(20000,20000) a = a.to('cuda') i=0 torch.cuda.synchronize() t1 = time.time() while i< 500: a += 1 a -= 1 i+=1 torch.cuda.synchronize() t2 = time.time() print('cuda string', t2-t1) > cuda 5.500105619430542 > cuda string 5.479088306427002
Also, it seems you’ve forgotten to reassign
a, so that this code will run on the CPU.
Thank you very much for the prompt reply and the good catch to the problem.
is this true? what if you are using multiple GPUs? would pytorch allocate the right GPUs automatically without have to specify them if one uses
PS: asked similar question Run Pytorch on Multiple GPUs but didn’t get an answer to that (if
.cuda() allocates to the right gpu automatically or if it has to be manually done all the time).
They have the same behavior:
.cuda() when you don’t specify the device, will use device
You can specify the device for both with
so we have to specify the device? always if we use multiple gpus?
(I am trying to have to modify my code as little as possible but I keep getting issues that tensors are not in the right GPUS…I am testing it with pytorch’s resnet18 )
No you don’t have to specify the device.
As mentioned in my first sentence above, if you don’t specify the device, device 0 will be used.
hmmm then I don’t understand why my model doesn’t work with multiple GPUs.
Do you try to use a GPU that is not the 0th somewhere?
Note that you can use
CUDA_VISIBLE_DEVICES=0 to hide all but the 0th GPU and avoid this issue altogether