CUDA operations are asynchronous, so you would have to synchronize all CUDA ops before starting and stopping the timer:
a = torch.rand(20000,20000)
a = a.cuda()
i=0
torch.cuda.synchronize()
t1 = time.time()
while i< 500:
a += 1
a -= 1
i+=1
torch.cuda.synchronize()
t2 = time.time()
print('cuda', t2-t1)
a = torch.rand(20000,20000)
a = a.to('cuda')
i=0
torch.cuda.synchronize()
t1 = time.time()
while i< 500:
a += 1
a -= 1
i+=1
torch.cuda.synchronize()
t2 = time.time()
print('cuda string', t2-t1)
> cuda 5.500105619430542
> cuda string 5.479088306427002
Also, it seems you’ve forgotten to reassign a.to('cuda')
to a
, so that this code will run on the CPU.