What are the best ways to decrease the inference time of a CNN network?


I have trained a fully convolutional neural network for monocular depth estimation, and its performance is quite satisfactory. I have used depthwise separable convolution instead of all normal convolutions in order to decrease the trainable parameters. However, the speed of testing a new image is relatively slow. The network can handle almost 7.4 frames per second on GPU. I have changed all weights type to float16. This solution could speed up the network to around 17 FPS. I want to know whether there are any techniques that can further speed up my network.


Take a look at the Performance Guide, which explains that e.g. cudnn.benchmark=True could be used for static inputs etc.