First inferences are faster then normal at TorchScript model

vodan37 · June 10, 2022, 2:31pm

3080 and V100
Ubuntu 18.04
driver 510.47.03 / 470.42.01 (V100)
cuda_11.6
torch 1.11.0+cu113

I have a model which i converted to TorchScript and save:

traced_script_module = torch.jit.trace(model, example)
traced_script_module.save(“traced_resnet_model.pt”)

then I load this model and try to inference (no matter rand or real image):

model = torch.jit.load(weights).eval().half()
for i in range(20):
t1 = time.time()
# pred = model(im)
pred = model(torch.rand((1, 3, 640, 640), dtype=torch.half, device=‘cuda:0’))
print(time.time() - t1)
Times:
0.13088488578796387
1.0498087406158447
0.0052030086517333984
0.0051364898681640625
0.0077664852142333984
0.010211706161499023
0.010109663009643555
0.010195255279541016
0.010144472122192383
0.010176897048950195
0.010180473327636719
…
As expected, we see at 1st and 2nd iterations some trash (warmup). But after it we see some iterations are faster then others:
0.0052030086517333984
0.0051364898681640625
0.0077664852142333984

How explain it? Is there any way how to avoid it?

ptrblck · June 11, 2022, 12:38am

CUDA operations are executed asynchronously so you would need to synchronize the code manually via torch.cuda.synchronize() before starting and stopping the timers.

vodan37 · June 14, 2022, 6:32am

Thanks a lot! Time is 10 ms now.