I’m still new to pytorch, and I was trying to implement the MobileNets (Howard et al) in Pytorch. In the paper the idea of a separable convolution is introduced. Tensorflow has a tf.slim that contains a separable convolution operation, I wanted to know if a similar operation is available in pytorch as well.
One way I see is to perform separable convolutions by using 1xN and Nx1 convolutions. It might not be as efficient as a single kernel call, but should be ok.
You have to apply a different 2d filter on each of the M input’s channels I_m, let say you obtain M filtered maps F_m. Then each of the N output’s channels O_n is a different linear combination of these F_m.
I don’t see how to avoid a loop over the input’s channels, applying M times nn.Conv2d(1,1). Then applying the N linear combinations may be done by a single matrix multiplication.
I’m not familiar with the concept of groups yet, however in this paper they use batchnorm after a depthwise and pointwise convolution. Alexis solution seems more plausible currently, but I’ll post what I find as soon as possible