2D convolution with different kernel per location

jhultman · April 18, 2019, 6:30pm

Math generally is less ambiguous than words so let me just show some equations of what I’m trying to implement. (1) is standard 2D convolution with unit batch size, 3x3 kernel size, and unit stride, and (2) is what I want to implement.

In equation (2), the kernel w is modulated via element-wise product by another tensor d whose values depend on both the spatial index (i, j) as well as the dummy “convolution index” (m, n). Note that the convolutional kernel w is learnable while the modulating tensor d is known. Another way to formulate this is to use d and w to form a new kernel of shape (3, 3, H, W, K_in, K_out) whose values depend not just on the convolution index (m, n) but also on the spatial index (i, j).

Is there a simple way to express this computation without rewriting the CUDA kernels for conv2d?

michaelklachko · April 18, 2019, 6:43pm

Don’t know if it’s possible to use conv2d for this, but maybe play with fold/unfold ops to avoid writing cuda kernel: https://pytorch.org/docs/stable/_modules/torch/nn/modules/fold.html

jhultman · April 18, 2019, 8:35pm

Thanks so much for this suggestion. Unfold + matmul + fold worked flawlessly.

Oktai15 · May 29, 2020, 10:25pm

Unfold is right solution, but it is super slow. Is there any idea how to resolve it?

michaelklachko · May 30, 2020, 12:36am

It should be about the same speed as the regular conv layer if you perform all dot products in parallel.

fschiffers · January 20, 2022, 3:55pm

Does anyone have some example code where such a “convolution” is implemented with the Fold operation?

ptrblck · January 20, 2022, 11:26pm

The nn.Unfold docs provide an example:

unfold = nn.Unfold(kernel_size=(2, 3))
input = torch.randn(2, 5, 3, 4)
output = unfold(input)
# each patch contains 30 values (2x3=6 vectors, each of 5 channels)
# 4 blocks (2x3 kernels) in total in the 3x4 input
output.size()

# Convolution is equivalent with Unfold + Matrix Multiplication + Fold (or view to output shape)
inp = torch.randn(1, 3, 10, 12)
w = torch.randn(2, 3, 4, 5)
inp_unf = torch.nn.functional.unfold(inp, (4, 5))
out_unf = inp_unf.transpose(1, 2).matmul(w.view(w.size(0), -1).t()).transpose(1, 2)
out = torch.nn.functional.fold(out_unf, (7, 8), (1, 1))
# or equivalently (and avoiding a copy),
# out = out_unf.view(1, 2, 7, 8)
(torch.nn.functional.conv2d(inp, w) - out).abs().max()

fschiffers · January 21, 2022, 3:35am

Thanks!

I’ve been trying to implement a Gaussian-like filter where the spread of each Gaussian depends on the intensity of the actual image pixel. In other words: If the pixel intensity is close to 0, the pixel shouldn’t blur, if the value is large (close to 1) it should blur a lot.

Creating the Gaussian kernel for each image pixel is simple, but I haven’t managed to find a way to combine my kernels with the unfolded image in a similar way as it was done in the example above.

mikonvergence · August 26, 2022, 5:16pm

I don’t know if this is what you were looking for, but I created this tiny piece of code to perform something similar.