Composition of convolution kernels with unfold and fold


I’m working on a project that involves composing two kernels, but where the first kernel is pixel dependent (say given by a MLP). This is related to this post, but not exactly.

To implement this, I’m thinking I’ll need nn.functional’s fold and unfold functions. To wrap my head around these functions, and as an intermediate step to coding the above, I would like to know: is it possible to write the composition of two kernels explicitly with unfold-matmul-fold?

For example:

k1 = torch.randn(5,2,3,3)
k2 = torch.randn(4,5,3,3)

# How to compute K = k2*k1 ?
# K should have shape (4,2,5,5)

Supposing we did have K explicitly, then we could double check it is correct with

x = torch.randn(10,2,8,8)
ycorrect = F.conv2d(F.conv2d(x,k1, padding=1), k2, padding=1)
ypredicted = F.conv2d(x, K, padding=2)
err = (ycorrect-ypredicted).abs().max()
print('Error: %.3g'%err)

There is a horribly slow way to compute the composed kernel K using linear indexing, but I’d rather avoid that approach. Maybe there is another way to get K using only F.conv2d, but I haven’t figured it out…

Any help would be greatly appreciated!