How can I get stable inference time of networks, or how to calculate FPS of model


There are many real-time networks such as MobileNet family, ShuffleNet family, ENet, ERFNet, EDANet and so on. And in these papers, there is a metrics to test prediction speed (maybe this statement is not accuracy) called FPS, what I want to do is to reproduce them and to know how to calculate FPS.
In my opinion, the first method is to feed a random tensor ( which has the same shape with input image) to the network and calculate average inference time.
So the code snippet I tried before as follows:

 def speed_testing(self):

    # cuDnn configurations
    torch.backends.cudnn.benchmark = True
    torch.backends.cudnn.deterministic = True

    name =
    print("     + {} Speed testing... ...".format(name))
    model ='cuda:{}'.format(self.config.device_id))
    random_input = torch.randn(1,3,self.config.input_size, self.config.input_size).to('cuda:{}'.format(self.config.device_id))


    time_list = []
    for i in tqdm(range(10001)):
        tic = time.time()
        # the first iteration time cost much higher, so exclude the first iteration
    time_list = time_list[1:]
    print("     + Done 10000 iterations inference !")
    print("     + Total time cost: {}s".format(sum(time_list)))
    print("     + Average time cost: {}s".format(sum(time_list)/10000))
    print("     + Frame Per Second: {:.2f}".format(1/(sum(time_list)/10000)))

I found that cudnn configuration can effect the inference time and set them both True, and the first iteration cost too much time than others, so I exclude it.

But when I run the same network, again and again, its inference speed is gradually growing. From 112fps --> 118fps --> 126fps, but sometimes fall to the bottom 100~fps.

My questions are:

  • Should I pass the real image to the network instead of a random tensor?
  • Does the cudnn configuration is right for the best environment to test networks inference time?
  • How can I get stable average inference time for each network?

Thanks in advance, any idea would be appreciated.

I found that cudnn configuration has a great influence on the average inference time. So I tried different cudnn configuration strategy. Drawn from the experiment:

  • cudnn.benchmark=True or cudnn.deterministic=True can improve the inference time, but it is randomly.
  • when I set them both False the average inference time is more stable, the upper and lower gap is small around 1fps, but it is slower than the first condition.
  • If there has another task run on the same GPU with you, it also can influence the testing of inference time.

I am not sure if I was right or not, does anyone have ideas on it?

  • It seems that we should pass some random data to the network firstly, because it needs a “warm up”.

The some question, when I test the FPS in pytorch, always much lower than the original paper’s result.
And it seems a lot of people encounter this proplem.

Looking forward to others’s reply!

1 Like

Thanks for your link.

Does it seem that the speed issue is in depthwise_conv2d?
But I tried to build straight up and down model stack 50 layers, one with normal conv2d and another with depthwise_conv2d, and the depthone’s average inference time is a little lower than the normal one.

What about the first question mentioned above, should we pass real images or random_tensor to the network for speed testing.

It’s no need, just pass the random_tensor like:
input = torch.randn(input_size)

1 Like