I was experimenting with depthwise convolutions and noticed that I’m not seeing any performance increase over standard convolutions. I tried a few different MobileNet architectures to look into this, but for the sake of repeatability I’ll reference this script which is a basic implementation of a MobileNet model: https://github.com/marvis/pytorch-mobilenet/blob/master/benchmark.py
If I change the script to use
groups=1, my runtime for a forward pass does not change at all, neither faster nor slower. GPU runtime on a forward pass is ~15ms and CPU runtime on a forward pass is ~250ms.
OS: Windows 10
GPU: GTX 1080
PyTorch: 1.0.1 (previously had 1.0.0 but upgraded to see if it made a difference)
cudnn: 7.4.1 (previously had 7.0.4 but upgraded to see if it made a difference)