Help needed to solve cuda tensor timing mystery

Hi,

I am trying to solve a mystery.
The following code snippet generates a number of random CUDA torch tensors of various sizes and operates a few operations on these tensors.
The stangness occurs when measuring the elapsed time using Python’s timeit.repeat function.
Somehow, the first few iterations are a couple of orders of magnitude faster than the last iterations :thinking:

Any idea ?

I am running this code on an Ubuntu 14.04 machine using python 3.6, pytorch 0.4.1.post2 and a Nvidia Quadro M4000 card.

Here is the code:

import numpy as np
import timeit
import gc
import torch

# size of the test
M = 10000
N = 10000
D = 3
E = 3
device = 'cuda'

# declare tensors
xc = torch.rand(M, D, device=device)
yc = torch.rand(N, D, device=device)
bc = torch.rand(N, E, device=device)
sigmac = torch.rand(1, device=device)


def _squared_distances(x, y):
    x_norm = (x ** 2).sum(1).view(-1, 1)
    y_norm = (y ** 2).sum(1).view(1, -1)
    dist = x_norm + y_norm - 2.0 * torch.mm(x, torch.transpose(y, 0, 1))
    return dist

def _func(x, y, s, b):
     sq = _squared_distances(x, y)
     torch.mm((-sq / (s * s)).exp(), b)


speed_pytorch = np.array(timeit.repeat("_func(xc, yc, sigmac, bc)", setup='gc.enable();', globals=globals(), repeat=100, number=1))
print(speed_pytorch)

And the output:

[0.00896002 0.00023558 0.00021449 0.000208   0.00020671 0.00020615
 0.00020571 0.00020591 0.00020429 0.0002054  0.00020259 0.00020241
 0.00020344 0.00021511 0.0002045  0.00020349 0.00020346 0.00020304
 0.00020419 0.00020382 0.00020363 0.00020238 0.0002032  0.00020386
 0.00020311 0.00020472 0.00020299 0.00020274 0.00020187 0.00020358
 0.00020355 0.00020294 0.00021372 0.00020253 0.00020109 0.0002006
 0.00020271 0.00020158 0.00020437 0.00020162 0.00020179 0.00020287
 0.0002021  0.00020064 0.00020138 0.00020067 0.00020173 0.00020194
 0.00020214 0.00020111 0.00020141 0.00021044 0.00020267 0.00020206
 0.00020099 0.00020164 0.00020256 0.00020333 0.00020216 0.00020247
 0.00020264 0.0002027  0.00020325 0.00020169 0.00020424 0.00020281
 0.00020306 0.00020255 0.00020205 0.00020288 0.00021285 0.00020318
 0.00020483 0.00020746 0.00020353 0.00020233 0.00020117 0.00020162
 0.00020206 0.02630114 0.04370697 0.04371016 0.0437286  0.04372212
 0.04371193 0.04371529 0.0437236  0.04370545 0.04372237 0.04372816
 0.04374273 0.04373613 0.04369926 0.04371182 0.04373186 0.04368425
 0.0436838  0.04373274 0.04373524 0.04371948]

Great thanks for anyone that give me some insight on what is happening.

Since CUDA operations are performed asynchronously, your time measurements are most likely a bit off.
You should call torch.cuda.synchronize() before starting and stopping your timer.

1 Like

Thanks !

So I changed the following line to include the torch.cuda.synchronize()before and after my function call:

speed_pytorch = np.array(timeit.repeat("torch.cuda.synchronize(); _func(xc, yc, sigmac, bc); torch.cuda.synchronize()", setup='gc.enable();', globals=globals(), repeat=100, number=1))

Here is the new output:

[0.09497609 0.0437831  0.04378436 0.04377997 0.04379042 0.04376097
 0.043768   0.04377071 0.04379098 0.04375806 0.0437383  0.04377674
 0.04379167 0.0437904  0.04375295 0.04374592 0.04381776 0.0440657
 0.04388161 0.04380486 0.04377719 0.04378537 0.04378086 0.04378364
 0.0437904  0.04376343 0.04379515 0.04380947 0.04377766 0.04377795
 0.04378235 0.0438005  0.04378827 0.0437455  0.04377471 0.04377941
 0.04375791 0.04373778 0.04379254 0.04377629 0.0437417  0.04372696
 0.04373221 0.04376795 0.04374417 0.04375209 0.04375083 0.04375326
 0.04374925 0.04375092 0.0438331  0.04380208 0.04376031 0.04374788
 0.04377659 0.04379657 0.04376828 0.04375811 0.0437649  0.04374111
 0.04378433 0.04379105 0.04379046 0.04375545 0.04374755 0.04376093
 0.04376668 0.04376569 0.04375325 0.04378189 0.04379228 0.04373931
 0.04377322 0.04377844 0.0437788  0.04377469 0.04376213 0.04377392
 0.04377192 0.04376317 0.04376967 0.04377655 0.04375631 0.04374702
 0.04378022 0.04380029 0.04377499 0.04374525 0.04377704 0.04376704
 0.0437587  0.04374709 0.04377755 0.04374888 0.04374979 0.04385973
 0.04406279 0.043799   0.04375602 0.04374366]

Mystery solved !
Thanks :smile:

1 Like