First inferences are faster then normal at TorchScript model

3080 and V100
Ubuntu 18.04
driver 510.47.03 / 470.42.01 (V100)
cuda_11.6
torch 1.11.0+cu113

I have a model which i converted to TorchScript and save:

traced_script_module = torch.jit.trace(model, example)
traced_script_module.save(“traced_resnet_model.pt”)

then I load this model and try to inference (no matter rand or real image):

model = torch.jit.load(weights).eval().half()
for i in range(20):
t1 = time.time()
# pred = model(im)
pred = model(torch.rand((1, 3, 640, 640), dtype=torch.half, device=‘cuda:0’))
print(time.time() - t1)
Times:
0.13088488578796387
1.0498087406158447
0.0052030086517333984
0.0051364898681640625
0.0077664852142333984
0.010211706161499023
0.010109663009643555
0.010195255279541016
0.010144472122192383
0.010176897048950195
0.010180473327636719

As expected, we see at 1st and 2nd iterations some trash (warmup). But after it we see some iterations are faster then others:
0.0052030086517333984
0.0051364898681640625
0.0077664852142333984

How explain it? Is there any way how to avoid it?

CUDA operations are executed asynchronously so you would need to synchronize the code manually via torch.cuda.synchronize() before starting and stopping the timers.

Thanks a lot! Time is 10 ms now.