Inference speed issue

lidehui · December 18, 2018, 3:55am

I have built a FCN-8s network for semantic segmentation with pytorch. When I evaluate networks inference speed, I found something strange as followed:

I define the network firstly:
model = FCN32s(NUM_CLASSES=20)
model = model.cuda()
model.eval()

Next, I define a tensor for input, and then run the network for 200 times and calculate the average inference time. Here something strange happened.

When I define the input tensor outside the loop as followed, the average inference time is 0.017s (i.e. 58.8 FPS).
batch = torch.FloatTensor(1, 3, 512, 1024)
batch = batch.cuda()
inputs = Variable(batch, volatile=True)
for i in range(1,201):
pre_time = time.time()
outputs = model(inputs)
time_used=time.time() - pre_time

However, when I define the input tensor inside the loop as followed, the average inference time is 0.0024s (i.e. 417.7 FPS).
for i in range(1,201):
batch = torch.FloatTensor(1, 3, 512, 1024)
batch = batch.cuda()
inputs = Variable(batch, volatile=True)
pre_time = time.time()
outputs = model(inputs)
time_used=time.time() - pre_time

The results are very different! Why does this difference happen and what is the true inference speed of network? Thanks!

ptrblck · December 18, 2018, 11:33am

Since CUDA calls are asynchronous, you should synchronize all CUDA calls before starting and stopping the timer using torch.cuda.synchronize(). Currently you are most likely measuring the time to launch the kernels, not their real execution time.
Also, Variables are deprecated, so if you are using a newer PyTorch version (> 0.3.1) you can just remove the Variable wrappers.

lidehui · December 21, 2018, 2:01pm

It helps me a lot. You help me again! Thank you very much!