Block squeezing: simplify sequence of convolutional layers

Hi guys,

I’m trying to reparametrize a sequence of convolutional layers into a single convolutional layer, but I’m running into some problems.

Let me explain better what’s the goal: given a random tensor x, two convolutional layers (conv1, conv2), I want to reparametrize them into a single convolutional layer (conv_rep), such that: conv2(conv1(x)) == conv_rep(x)

The reparametrize convolutional kernels function:

import torch
from torch import Tensor

def reparametrize_conv_kernels(k1: Tensor, k2: Tensor) -> Tensor:
        """
        Reparametrize convolutional kernels into a single one.
        
        Reference: https://arxiv.org/pdf/2204.00826.pdf (Figure 4a)


        Parameters
        ----------
        k1
            tensor of shape (ch_out1, ch_in1, ks1, ks1)
        k2
            tensor of shape (ch_out2, ch_out1, ks2, ks2)


        Returns
        -------
            Tensor of shape (ch_out2, ch_in1, ks1+ks2-1, ks1+ks2-1)
        """
        padding = k2.shape[-1] - 1
        k1 = k1.permute(1, 0, 2, 3)
        k2 = k2.flip(-1, -2)

        return torch.conv2d(k1, k2, padding=padding).permute(1, 0, 2, 3)

An example:

ch_out1, ch_in1, ks1 = (3, 5, 3)
ch_out2, ch_in2, ks2 = (1, 3, 3)

k1 = torch.randn(ch_out1, ch_in1, ks1, ks1)
k2 = torch.randn(ch_out2, ch_in2, ks2, ks2)

batch_size, image_height, image_width = 1, 6, 6
x = torch.randn(batch_size, ch_in1, image_height, image_width)

padding_conv1 = 1
padding_conv2 = 1
padding_rep_conv = padding_conv1 + padding_conv2

out1 = torch.conv2d(x, k1, padding=padding_conv1)
out2 = torch.conv2d(out1, k2, padding=padding_conv2)

k3 = reparametrize_conv_kernels(k1, k2)
out12 = torch.conv2d(x, k3, padding=padding_rep_conv)

torch.testing.assert_allclose(out2, out12)

The test fails raising the following error:

AssertionError: Tensor-likes are not close!
E       
E       Mismatched elements: 20 / 36 (55.6%)
E       Greatest absolute difference: 30.521246433258057 at index (0, 0, 1, 5) (up to 1e-05 allowed)
E       Greatest relative difference: 3.333448777518062 at index (0, 0, 0, 1) (up to 0.0001 allowed)

I’m not sure about the math theory behind the whole process (any useful references are really appreciated), and in general I haven’t understood yet how to generalize this procedure with various settings like kernel sizes, padding, groups etc.

Does the rep_conv need to have kernel_size as indicated above, or are there any alternatives?

Any suggestions?

Thanks in advance.

I think the mismatch is caused by the padding in conv2. If you remove the padding or use:

padding_conv1 = 2
padding_conv2 = 0

you should get (approx.) the same results. The padding in conv1 would cause a different “border” output in the intermediate results (out1) and will thus create the mismatches at the border.

Hi @ptrblck,

thank you for the reply.

Yeah, the reparametrization works with the padding values you set, but I’m looking for something which is applicable potentially to any convolutional layer: the reparametrization function should be independent of the convolutional layers’ settings like padding, stride, groups, etc.

Supporting only some convolutional layer parameters is a great limitation: for instance, the majority of the convolutional layers used in CNN architectures have padding = 1. Thus, a block with two sequential conv3x3 cannot be reparametrized.