Conv2d and Linear layer for 1x1 image

Hi. Today I was trying to convert a model weight to my implementation. I found a problem converting a Squeeze-Excitation Block that is the following:

class SEBlock(nn.Module):

    def __init__(self, in_channels, out_channels, reduction_ratio=4):
        super(SEBlock, self).__init__()

        red_channels = round_by(in_channels / reduction_ratio, 8)
        self.conv1 = nn.Conv2d(in_channels, red_channels, 1)
        self.activation = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv2d(red_channels, out_channels, 1)

    def forward(self, input):
        x = F.adaptive_avg_pool2d(input, 1)
        x = self.conv1(x)
        x = self.activation(x)
        x = self.conv2(x)
        return input * hard_sigmoid(x)

However, in the original model, the pointwise convolutions are nn.Linear layers. I think that they could be easily converted as such (because the input is a 1x1 image):

linear = nn.Linear(16, 8)
conv = nn.Conv2d(16, 8, 1)

conv.weight.copy_(linear.weight.reshape(8, 16, 1, 1));
conv.bias.copy_(linear.bias);

Unfortunelly, this does not reproduce the same result!

x = torch.randn((3, 16, 1, 1))

torch.norm(conv(x) - linear(x.view(3, 16)))
>>> tensor(16.5712)

What am I missing here?

Hi, the issue is the way you are calculating norm. As outputs do not have same shape, torch broadcast values so you get wrong norm.

linear = nn.Linear(16, 8)
conv = nn.Conv2d(16, 8, 1)

conv.weight = nn.Parameter(linear.weight.reshape(8, 16, 1, 1))
conv.bias = nn.Parameter(linear.bias)

batch=5
x = torch.randn(batch, 16, 1, 1)
co = conv(x)
lo = linear(x.view(batch, 1, 16))
co = co.view(lo.shape)
torch.norm(co - lo)  # small value around 2.0e-07

Bests

Yeah you are absolutely correct :smile:. Gotta check the broadcasting rules of pytorch again.