Math generally is less ambiguous than words so let me just show some equations of what I’m trying to implement. (1) is standard 2D convolution with unit batch size, 3x3 kernel size, and unit stride, and (2) is what I want to implement.
In equation (2), the kernel
w is modulated via element-wise product by another tensor
d whose values depend on both the spatial index
(i, j) as well as the dummy “convolution index”
(m, n). Note that the convolutional kernel
w is learnable while the modulating tensor
d is known. Another way to formulate this is to use
w to form a new kernel of shape
(3, 3, H, W, K_in, K_out) whose values depend not just on the convolution index
(m, n) but also on the spatial index
Is there a simple way to express this computation without rewriting the CUDA kernels for conv2d?
Don’t know if it’s possible to use conv2d for this, but maybe play with fold/unfold ops to avoid writing cuda kernel: https://pytorch.org/docs/stable/_modules/torch/nn/modules/fold.html
Thanks so much for this suggestion. Unfold + matmul + fold worked flawlessly.
Unfold is right solution, but it is super slow. Is there any idea how to resolve it?
It should be about the same speed as the regular conv layer if you perform all dot products in parallel.
Does anyone have some example code where such a “convolution” is implemented with the Fold operation?
nn.Unfold docs provide an example:
unfold = nn.Unfold(kernel_size=(2, 3))
input = torch.randn(2, 5, 3, 4)
output = unfold(input)
# each patch contains 30 values (2x3=6 vectors, each of 5 channels)
# 4 blocks (2x3 kernels) in total in the 3x4 input
# Convolution is equivalent with Unfold + Matrix Multiplication + Fold (or view to output shape)
inp = torch.randn(1, 3, 10, 12)
w = torch.randn(2, 3, 4, 5)
inp_unf = torch.nn.functional.unfold(inp, (4, 5))
out_unf = inp_unf.transpose(1, 2).matmul(w.view(w.size(0), -1).t()).transpose(1, 2)
out = torch.nn.functional.fold(out_unf, (7, 8), (1, 1))
# or equivalently (and avoiding a copy),
# out = out_unf.view(1, 2, 7, 8)
(torch.nn.functional.conv2d(inp, w) - out).abs().max()
I’ve been trying to implement a Gaussian-like filter where the spread of each Gaussian depends on the intensity of the actual image pixel. In other words: If the pixel intensity is close to 0, the pixel shouldn’t blur, if the value is large (close to 1) it should blur a lot.
Creating the Gaussian kernel for each image pixel is simple, but I haven’t managed to find a way to combine my kernels with the unfolded image in a similar way as it was done in the example above.
I don’t know if this is what you were looking for, but I created this tiny piece of code to perform something similar.