I have trained a fully convolutional neural network for monocular depth estimation, and its performance is quite satisfactory. I have used depthwise separable convolution instead of all normal convolutions in order to decrease the trainable parameters. However, the speed of testing a new image is relatively slow. The network can handle almost 7.4 frames per second on GPU. I have changed all weights type to float16. This solution could speed up the network to around 17 FPS. I want to know whether there are any techniques that can further speed up my network.