3080 and V100
Ubuntu 18.04
driver 510.47.03 / 470.42.01 (V100)
cuda_11.6
torch 1.11.0+cu113
I have a model which i converted to TorchScript and save:
traced_script_module = torch.jit.trace(model, example)
traced_script_module.save(“traced_resnet_model.pt”)
then I load this model and try to inference (no matter rand or real image):
model = torch.jit.load(weights).eval().half()
for i in range(20):
t1 = time.time()
# pred = model(im)
pred = model(torch.rand((1, 3, 640, 640), dtype=torch.half, device=‘cuda:0’))
print(time.time() - t1)
Times:
0.13088488578796387
1.0498087406158447
0.0052030086517333984
0.0051364898681640625
0.0077664852142333984
0.010211706161499023
0.010109663009643555
0.010195255279541016
0.010144472122192383
0.010176897048950195
0.010180473327636719
…
As expected, we see at 1st and 2nd iterations some trash (warmup). But after it we see some iterations are faster then others:
0.0052030086517333984
0.0051364898681640625
0.0077664852142333984
How explain it? Is there any way how to avoid it?