# Speed up fully-separable 1D convolution?

Suppose I have a 1D convolutional layer with 2 input channels, 32 output channels, and length 9 kernels. My weight tensor has a very special structure: it can be expressed as an “outer product” of three tensors as seen below, where I generate a dummy weight matrix and some dummy data of this form and calculate the convolution using conv1d:

``````import torch
import torch.nn.functional as F

in_channels = 2
out_channels = 32
kernel_size = 9
nsamples = 2**12
batch_size = 1
x = torch.randn(batch_size, in_channels, nsamples)

outspace = torch.randn(out_channels,1,1)
inspace = torch.randn(1,in_channels,1)
kernelspace = torch.randn(1,1,kernel_size)
w = outspace*inspace*kernelspace
The space of all weight tensors of the same shape as `w` shape requires `in_channels*out_channels*kernel_size parameters` to describe, but the restricted space of tensors that factor as above only requires `in_channels+out_channels+kernel_size parameters` to describe. Correspondingly, there should be a sequence of simpler operations that calculates `F.conv1d(x,w)` as above with far fewer multiplies when `w` has the special structure above. Something like
``````y = op1(kernelspace,x)
For some `op1`, `op2`, and `op3`. I can write out the math for how to do it but I’m wondering what’s the fast way to do this in pytorch? I imagine group convolution and clever use of matmul would do it, but I’m not familiar enough with the API to formulate it.