Is there any difference between x.to('cuda') vs x.cuda()? Which one should I use?

Hi,

I am seeing a big timing difference in the following piece of code:

import torch
from datetime import datetime
a = torch.rand(20000,20000)
a = a.cuda()

#a = torch.rand(20000,20000)
#a.to('cuda')

i=0
t1 = datetime.now()
while i< 500:
    a += 1
    a -= 1
    i+=1
t2 = datetime.now()
print('cuda', t2-t1)

This code will take 1 min.

Using

a = torch.rand(20000,20000)
a.to('cuda')

instead takes only ~1 second.

Am I getting something wrong?

I am using torch 1.0+

1 Like