2D convolution with 3D kernel

I am trying to perform a convolution over the Height and Width dimensions of a batch of input tensor cubes using kernels (which I have made myself) for every depth slice, without any movement of the kernel in the 3rd dimension (in this case the depth).

So say I had a batch of 3 tensor cubes:

import torch 

batch = torch.rand((3,4,4,4))

I would like to convolve each cube with some 2D kernels (1 for each batch member):

weights = torch.rand((3,4,4))

but convolve only in the Height and Width dimensions somehow. I suspect using some variant of the following code block:

kernels = torch.nn.Parameter(weights,requires_grad=False)

output = torch.nn.functional.conv3d(batch,weights,padding=1)

Is there a preferred/optimal way to do this? The only way I can see is to perform a 3D convolution but somehow set the stride to be [0,1,1] which I don’t think PyTorch will allow me to do.

Perhaps this is in fact a very simple problem and one can do this with 2D convolutions and I would love to know how, but the key points here are:

  • The output should have the same shape as the input (obviously padding helps but you can’t just run a 2D convolution over each batch member or the depth axis will come out as 1?)

  • Each batch member should be convolved with its own kernel.

Many thanks in advance!


I’ve been playing around and think I may have this working using the following code:

import torch 
from functions import makebeam

batch = torch.rand((8,1,30,30,30))

weights = torch.rand((1,1,1,3,3))

conv = torch.nn.Conv3d(1,1,4,padding=[0,1,1],bias=False)

with torch.no_grad():
    conv.weight = torch.nn.Parameter(weights,requires_grad=False)

output = conv(batch)

Although I am still confused as to how passing a 2D kernel to a 3D convolution over a 3D tensor performs a convolution over each depth slice

I think what you are looking for is DepthWise Separable Convolution, where a 2D filter is used on 3d conv volume to avoid computation cost. And a constraint is that the number of channels(depth) of input and output will be the same. i.e. one 2D filter for every channel in the input conv volume.

Thanks for the reply and sorry for my slow reply! I’ll take a look into DepthWise Separable Convolutions and get back to you :slight_smile:

Update: I’ve had a look into depthwise separable convolutions and they appear to be the same as my edit above but where I’m explicitly creating stacked-2D kernels into one 3D kernel per object that I wish to convolve with, and in depthwise separable convolution you would convolve each channel with a 2D kernel and then stack the results into a 3D output…I think that’s how it works anyway