Suppose I have a 1D convolutional layer with 2 input channels, 32 output channels, and length 9 kernels. My weight tensor has a very special structure: it can be expressed as an “outer product” of three tensors as seen below, where I generate a dummy weight matrix and some dummy data of this form and calculate the convolution using conv1d:

```
import torch
import torch.nn.functional as F
in_channels = 2
out_channels = 32
kernel_size = 9
nsamples = 2**12
batch_size = 1
padding = kernel_size//2
x = torch.randn(batch_size, in_channels, nsamples)
outspace = torch.randn(out_channels,1,1)
inspace = torch.randn(1,in_channels,1)
kernelspace = torch.randn(1,1,kernel_size)
w = outspace*inspace*kernelspace
y = F.conv1d(x,w,padding=padding)
```

The space of all weight tensors of the same shape as `w`

shape requires `in_channels*out_channels*kernel_size parameters`

to describe, but the restricted space of tensors that factor as above only requires `in_channels+out_channels+kernel_size parameters`

to describe. Correspondingly, there should be a sequence of simpler operations that calculates `F.conv1d(x,w)`

as above with far fewer multiplies when `w`

has the special structure above. Something like

```
y = op1(kernelspace,x)
y = op2(inspace, y)
y = op3(outpsace, y)
```

For some `op1`

, `op2`

, and `op3`

. I can write out the math for how to do it but I’m wondering what’s the fast way to do this in pytorch? I imagine group convolution and clever use of matmul would do it, but I’m not familiar enough with the API to formulate it.