Hi,

Can anybody tell me how to code a Spatial Separable Convolution layer?

Thanks.

Hi,

Can anybody tell me how to code a Spatial Separable Convolution layer?

Thanks.

This previous discussion may seem relevant:-

Hi,

This is for Depthwise Separable Conv.

I want for Spatial Separable Conv.

Thanks.

My bad I misread your post

While I’m still here let me try and be a little helpful, but trying to understand the concept of spatial separable convolutions from here:

You need to be able to factor your initial kernel into two different kernels. Since the kernels are real valued floating points learnt from the data, there may not be a solution to decompose every real-valued square NXN matrix into NX1 and 1XN vectors. (Ref)

To have an autograd engine enforce this between two layers would seem very complicated as well even if it were possible, it seems like this is something that would only be possible on the forward pass for pre-determined filters.

But I am unaware if there is a way to do so regardless, just thought this might help

Hi Tejan!

Unless I misunderstand your question, you can use two convolutions in

a row (without an intervening non-linear activation).

So if you want, say, a 5x5 separable convolution (with single channels):

```
conv15 = torch.nn.Conv2d (1, 1, (1, 5), bias = False)
conv51 = torch.nn.Conv2d (1, 1, (5, 1), bias = False)
y = conv51 (conv15 (x))
```

`y`

is now `x`

convolved with a separable 5x5 kernel. (And as `conv15`

and `conv51`

are trained, their product will, of course, stay separable.)

Best.

K. Frank

Hi,

This would not be correct. Because in my understanding, what Spatial Separable Conv does is that for e.g.

In the forward pass, there is a 3x3 kernel, then, it would break the kernel into two parts, say, (3x1) and (1x3), and then the convolution process would go on, as usual, 1st (3x1) and then (1x3). Now, in the backward pass, the model should give us a (3x3) kernel, which was our original kernel_size which then **should** be breakable into a (3x1) and (1x3) kernel.

This is my understanding. It might be I am wrong.

Thanks.