What are the best ways to decrease the inference time of a CNN network?

Alireza · January 16, 2022, 4:53am

Hi,

I have trained a fully convolutional neural network for monocular depth estimation, and its performance is quite satisfactory. I have used depthwise separable convolution instead of all normal convolutions in order to decrease the trainable parameters. However, the speed of testing a new image is relatively slow. The network can handle almost 7.4 frames per second on GPU. I have changed all weights type to float16. This solution could speed up the network to around 17 FPS. I want to know whether there are any techniques that can further speed up my network.

Thanks.

ptrblck · January 17, 2022, 4:55am

Take a look at the Performance Guide, which explains that e.g. cudnn.benchmark=True could be used for static inputs etc.

Alireza · January 24, 2023, 3:49am

Thanks @ptrblck_de some guides in the provided link were helpful.

sardanian · January 24, 2023, 5:52am

I have found that converting to a tensorrt via torch_tensorrt has been the fastest for inference.