Is there any difference between x.to(‘cuda’) vs x.cuda()? Which one should I use? Documentation seems to suggest to use x.to(‘cuda’).
I’m quite new to PyTorch, so there may be more to it than this, but I think that one advantage of using
x.to(device) is that you can do something like this:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') x = x.to(device)
Then if you’re running your code on a different machine that doesn’t have a GPU, you won’t need to make any changes. If you explicitly do
x = x.cuda() or even
x = x.to('cuda') then you’ll have to make changes for CPU-only machines.
.cuda()/.cpu() is the old, pre-0.4 way. As of 0.4, it is recommended to use .to(device) because it is more flexible, as neighthan showed above.
I am seeing a big timing difference in the following piece of code:
import torch from datetime import datetime a = torch.rand(20000,20000) a = a.cuda() #a = torch.rand(20000,20000) #a.to('cuda') i=0 t1 = datetime.now() while i< 500: a += 1 a -= 1 i+=1 t2 = datetime.now() print('cuda', t2-t1)
This code will take 1 min.
a = torch.rand(20000,20000) a.to('cuda')
instead takes only ~1 second.
Am I getting something wrong?
I am using torch 1.0+
CUDA operations are asynchronous, so you would have to synchronize all CUDA ops before starting and stopping the timer:
a = torch.rand(20000,20000) a = a.cuda() i=0 torch.cuda.synchronize() t1 = time.time() while i< 500: a += 1 a -= 1 i+=1 torch.cuda.synchronize() t2 = time.time() print('cuda', t2-t1) a = torch.rand(20000,20000) a = a.to('cuda') i=0 torch.cuda.synchronize() t1 = time.time() while i< 500: a += 1 a -= 1 i+=1 torch.cuda.synchronize() t2 = time.time() print('cuda string', t2-t1) > cuda 5.500105619430542 > cuda string 5.479088306427002
Also, it seems you’ve forgotten to reassign
a, so that this code will run on the CPU.
Thank you very much for the prompt reply and the good catch to the problem.