How to get forward time

import torch
import time
from torchvision.models import vgg16

size = 512
num = 1000

net = vgg16().features.cuda()

print(net)

x = torch.zeros((1, 3, size, size)).cuda()

cost = 0

for i in range(num):
    t0 = time.time()
    y = net(x)
    t1 = time.time()
    cost += t1 - t0

cost = cost / num * 1000

print("input size is {}, test {} times, average spend time {:.2f}ms".format(size, num, cost))

My GPU is GTX1060
when set num=10, i got 1.58ms
when set num=1000 i got 33.64ms
how to understand?

As CUDA calls are asynchronously, you would have to synchronize before starting and stopping the timer:

torch.cuda.synchronize()
t0 = time.time()
y = net(x)
torch.cuda.synchronize()
t1 = time.time()

Could you add it and try it again?