Out product ger issue

m = 2000
n = 2500
ell = 10
a = torch.randn(m*n).cuda()
v = torch.randn(ell).cuda()

ske = torch.zeros(ell, m*n).cuda()
st = time.time()
ske = torch.ger(v, a)
ed = time.time()
print(‘ger’,ed - st)

b = torch.randn(m*n).cuda()
st = time.time()
ske = torch.ger(v, b)
ed = time.time()
print(‘ger’,ed - st)

The first ger runs 0.1s while second one runs 0.0006s. What happened on this two gers? Similar problems also arise in torch.dot(), the running time differs several tens of times.

You have to synchronize cuda, since it is called asynchronously.

torch.cuda.synchronize()
st = time.time()
ske = torch.ger(v, a)
torch.cuda.synchronize()
ed = time.time()
...

Could you check again the times?

Thank you for your answer, it works!