I am trying to decompose the NxN kernel in my custom ConvNet into two 1xN & Nx1 kernels. The inference speed of the model is slightly reduced as compared to the model with the NxN kernel, however, the training speed decreased a lot (i.e. reduced almost by 2 - 2.5x). Is there any way that I can train the model with 1xN & Nx1 Conv kernels faster in PyTorch? Currently, I am using 1.10.0+cu113 version of PyTorch.
Any guidance/help would be really appreciated. Thank you
I wonder if OP is trying to reduce the Convolution to its first principal component by breaking it up into its nearest representation that factors into a product of N-by-1 and 1-by-N matrices. If that is the case, I have no idea if one should expect to get any speed-up from such a “simplification”.