How can I get stable inference time of networks, or how to calculate FPS of model

MariosOreo · March 18, 2019, 1:17am

Hi,

There are many real-time networks such as MobileNet family, ShuffleNet family, ENet, ERFNet, EDANet and so on. And in these papers, there is a metrics to test prediction speed (maybe this statement is not accuracy) called FPS, what I want to do is to reproduce them and to know how to calculate FPS.
In my opinion, the first method is to feed a random tensor ( which has the same shape with input image) to the network and calculate average inference time.
So the code snippet I tried before as follows:

 def speed_testing(self):

    # cuDnn configurations
    torch.backends.cudnn.benchmark = True
    torch.backends.cudnn.deterministic = True

    name = self.model.name
    print("     + {} Speed testing... ...".format(name))
    model = self.model.to('cuda:{}'.format(self.config.device_id))
    random_input = torch.randn(1,3,self.config.input_size, self.config.input_size).to('cuda:{}'.format(self.config.device_id))

    model.eval()

    time_list = []
    for i in tqdm(range(10001)):
        torch.cuda.synchronize()
        tic = time.time()
        model(random_input)
        torch.cuda.synchronize()
        # the first iteration time cost much higher, so exclude the first iteration
        #print(time.time()-tic)
        time_list.append(time.time()-tic)
    time_list = time_list[1:]
    print("     + Done 10000 iterations inference !")
    print("     + Total time cost: {}s".format(sum(time_list)))
    print("     + Average time cost: {}s".format(sum(time_list)/10000))
    print("     + Frame Per Second: {:.2f}".format(1/(sum(time_list)/10000)))

I found that cudnn configuration can effect the inference time and set them both True, and the first iteration cost too much time than others, so I exclude it.

But when I run the same network, again and again, its inference speed is gradually growing. From 112fps --> 118fps --> 126fps, but sometimes fall to the bottom 100~fps.

My questions are:

Should I pass the real image to the network instead of a random tensor?
Does the cudnn configuration is right for the best environment to test networks inference time?
How can I get stable average inference time for each network?

Thanks in advance, any idea would be appreciated.

MariosOreo · March 18, 2019, 6:59am

I found that cudnn configuration has a great influence on the average inference time. So I tried different cudnn configuration strategy. Drawn from the experiment:

cudnn.benchmark=True or cudnn.deterministic=True can improve the inference time, but it is randomly.
when I set them both False the average inference time is more stable, the upper and lower gap is small around 1fps, but it is slower than the first condition.
If there has another task run on the same GPU with you, it also can influence the testing of inference time.

I am not sure if I was right or not, does anyone have ideas on it?
EDIT:

It seems that we should pass some random data to the network firstly, because it needs a “warm up”.

Reagan1311 · March 19, 2019, 2:27am

The some question, when I test the FPS in pytorch, always much lower than the original paper’s result.
And it seems a lot of people encounter this proplem.

Looking forward to others’s reply!

MariosOreo · March 19, 2019, 2:44am

Thanks for your link.

Does it seem that the speed issue is in depthwise_conv2d?
But I tried to build straight up and down model stack 50 layers, one with normal conv2d and another with depthwise_conv2d, and the depthone’s average inference time is a little lower than the normal one.

What about the first question mentioned above, should we pass real images or random_tensor to the network for speed testing.

Reagan1311 · March 19, 2019, 3:04am

It’s no need, just pass the random_tensor like:
input = torch.randn(input_size)