Hi guys,
I’m trying to reparametrize a sequence of convolutional layers into a single convolutional layer, but I’m running into some problems.
Let me explain better what’s the goal: given a random tensor x
, two convolutional layers (conv1
, conv2
), I want to reparametrize them into a single convolutional layer (conv_rep
), such that: conv2(conv1(x)) == conv_rep(x)
The reparametrize convolutional kernels function:
import torch
from torch import Tensor
def reparametrize_conv_kernels(k1: Tensor, k2: Tensor) -> Tensor:
"""
Reparametrize convolutional kernels into a single one.
Reference: https://arxiv.org/pdf/2204.00826.pdf (Figure 4a)
Parameters
----------
k1
tensor of shape (ch_out1, ch_in1, ks1, ks1)
k2
tensor of shape (ch_out2, ch_out1, ks2, ks2)
Returns
-------
Tensor of shape (ch_out2, ch_in1, ks1+ks2-1, ks1+ks2-1)
"""
padding = k2.shape[-1] - 1
k1 = k1.permute(1, 0, 2, 3)
k2 = k2.flip(-1, -2)
return torch.conv2d(k1, k2, padding=padding).permute(1, 0, 2, 3)
An example:
ch_out1, ch_in1, ks1 = (3, 5, 3)
ch_out2, ch_in2, ks2 = (1, 3, 3)
k1 = torch.randn(ch_out1, ch_in1, ks1, ks1)
k2 = torch.randn(ch_out2, ch_in2, ks2, ks2)
batch_size, image_height, image_width = 1, 6, 6
x = torch.randn(batch_size, ch_in1, image_height, image_width)
padding_conv1 = 1
padding_conv2 = 1
padding_rep_conv = padding_conv1 + padding_conv2
out1 = torch.conv2d(x, k1, padding=padding_conv1)
out2 = torch.conv2d(out1, k2, padding=padding_conv2)
k3 = reparametrize_conv_kernels(k1, k2)
out12 = torch.conv2d(x, k3, padding=padding_rep_conv)
torch.testing.assert_allclose(out2, out12)
The test fails raising the following error:
AssertionError: Tensor-likes are not close!
E
E Mismatched elements: 20 / 36 (55.6%)
E Greatest absolute difference: 30.521246433258057 at index (0, 0, 1, 5) (up to 1e-05 allowed)
E Greatest relative difference: 3.333448777518062 at index (0, 0, 0, 1) (up to 0.0001 allowed)
I’m not sure about the math theory behind the whole process (any useful references are really appreciated), and in general I haven’t understood yet how to generalize this procedure with various settings like kernel sizes, padding, groups etc.
Does the rep_conv
need to have kernel_size
as indicated above, or are there any alternatives?
Any suggestions?
Thanks in advance.