Inference speed issue

I have built a FCN-8s network for semantic segmentation with pytorch. When I evaluate networks inference speed, I found something strange as followed:

I define the network firstly:
model = FCN32s(NUM_CLASSES=20)
model = model.cuda()
model.eval()

Next, I define a tensor for input, and then run the network for 200 times and calculate the average inference time. Here something strange happened.

When I define the input tensor outside the loop as followed, the average inference time is 0.017s (i.e. 58.8 FPS).
batch = torch.FloatTensor(1, 3, 512, 1024)
batch = batch.cuda()
inputs = Variable(batch, volatile=True)
for i in range(1,201):
pre_time = time.time()
outputs = model(inputs)
time_used=time.time() - pre_time

However, when I define the input tensor inside the loop as followed, the average inference time is 0.0024s (i.e. 417.7 FPS).
for i in range(1,201):
batch = torch.FloatTensor(1, 3, 512, 1024)
batch = batch.cuda()
inputs = Variable(batch, volatile=True)
pre_time = time.time()
outputs = model(inputs)
time_used=time.time() - pre_time

The results are very different! Why does this difference happen and what is the true inference speed of network? Thanks!

Since CUDA calls are asynchronous, you should synchronize all CUDA calls before starting and stopping the timer using torch.cuda.synchronize(). Currently you are most likely measuring the time to launch the kernels, not their real execution time.
Also, Variables are deprecated, so if you are using a newer PyTorch version (> 0.3.1) you can just remove the Variable wrappers.

1 Like

It helps me a lot. You help me again! Thank you very much!