First time run on cuda is slow

m = 998
n = 2473
a = torch.randn(m*n).cuda()

st = time.time()
r = torch.dot(a,a)
ed = time.time()
print(ed - st)

this will output 0.09s

st = time.time()
r = torch.dot(a,a)
ed = time.time()
print(ed - st)

This will output 0.0001. This is 100x times faster. What’s the reason? I am on Ubuntu 16.04, pytorch 0.3, cuda 9.

Hi,

Since cuda is initialized lazily in pytorch, the first time you use it, it has a higher runtime.
Also when doing timings in cuda, you need to manually synchronize because the cuda api is asynchronous.

import torch
import time

m = 998
n = 2473
a = torch.randn(m*n).cuda()

torch.cuda.synchronize()
st = time.time()
r = torch.dot(a,a)
torch.cuda.synchronize()
ed = time.time()
print("initial run")
print(ed - st)

torch.cuda.synchronize()
st = time.time()
r = torch.dot(a,a)
torch.cuda.synchronize()
ed = time.time()
print("normal run")
print(ed - st)

torch.cuda.synchronize()
st = time.time()
r = torch.dot(a,a)
torch.cuda.synchronize()
ed = time.time()
print("normal run")
print(ed - st)