How to collapse one convolution and a dense layer into just one linear layer?

Hello,

I would like to collapse a 2d convolution and a dense layer to a single linear layer.

For example:

Given the network:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 5, 5, 2, 1)
        self.conv2 = nn.Conv2d(5, 50, 5, (2, 2), 0)
        self.fc1 = nn.Linear(1250, 100)
        self.fc2 = nn.Linear(100, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x) 
        x = self.conv2(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.fc2(x)
        return x

how to generate a network:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 5, 5, 2, 1)
        self.fcnew = nn.Linear(845, 100)
        self.fc2 = nn.Linear(100, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x) 
        x = torch.flatten(x, 1)
        x = self.fcnew(x)
        x = F.relu(x)
        x = self.fc2(x)
        return x

where fcnew layer consist of conv2 and fc1 layers collapsed. How calculate the weights for the fcnew layer?

Thanks :slight_smile:

Based strictly on what you have above, you would need to know the image input size.

https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html

This would give you the output of the Conv2d layer:

But if you want something more flexible, you can use an AdaptiveAvgPool2d to ensure the output is equal to the number of out_channels.

Based on your new edits, it’s still not possible to determine the fcnew size because you haven’t specified the image input size.

Hi Bernardo!

Two comments to start:

First, for context, yes, because there is no intervening nonlinearity between
conv2 and fc, the two may be replaced by (collapsed into) a single Linear.
This is because a Linear is the most general linear (technically speaking,
affine) transformation that has the given numbers of input and output
variables. So whatever net linear (technically affine) transformation conv2
and fc1 generate when applied sequentially (without an intervening
nonlinearity), this transformation may be exactly reproduced (up to some
numerical round-off error) by a single Linear.

Second, why do you even want to start with a separate conv2 and fc1?
Your desired fcnew contains fewer individual parameters than do conv2
and fc1 together (largely because of the relatively large number of
out_channels in conv2 that then just get combined back together by fc1).
My intuition is that the single fcnew layer will train more efficiently than
conv2 and fc1, so why not just train fcnew from scratch, rather than build
fcnew from the weights of some pre-existing conv2 and fc1?

Having said that, probably the easiest way to collapse conv2 and fc1
together – that is, to compute the weights of your fcnew layer – will be
to pass a series of “single-pixel” images through conv2 and fc1. The
values of the pixels of the “output” images so obtained will be, roughly
speaking, the individual values in fcnew’s .weight tensor.

Consider:

>>> import torch
>>> torch.__version__
'1.13.0'
>>>
>>> _ = torch.manual_seed (2023)
>>>
>>> # shape of intermediate "input" image
>>> in_channels = 5
>>> h = 13
>>> w = h
>>>
>>> # convolution parameters
>>> out_channels = 50
>>> kernel = 5
>>> stride = 2
>>>
>>> # fully-connected parameters
>>> in_features = int (out_channels * (((h - kernel) / 2) + 1)**2)
>>> out_features = 100
>>>
>>> # create layers to collapse
>>> conv2 = torch.nn.Conv2d (in_channels, out_channels, kernel, stride)
>>> fc1 = torch.nn.Linear (in_features, out_features)
>>>
>>> # create collapsed bias from conv2 and fc1
>>> bias = fc1 (torch.flatten (conv2 (torch.zeros (1, in_channels, h, w))))
>>>
>>> # create collapsed weight from conv2 and fc1 (and bias)
>>> # batch of images, each with only a single pixel turned on
>>> n_pixels = in_channels * h * w   # number of pixels (including channels) in input image
>>> pixel_batch = torch.eye (n_pixels).reshape (n_pixels, in_channels, h, w)
>>> weight = (fc1 (torch.flatten (conv2 (pixel_batch), 1)) - bias).T
>>>
>>> # create collapsed Linear
>>> fcnew = torch.nn.Linear (n_pixels, out_features)   # Linear of correct shape
>>> # copy in collapsed weight and bias
>>> with torch.no_grad():
...     _ = fcnew.weight.copy_ (weight)
...     _ = fcnew.bias.copy_ (bias)
...
>>> # check on example batch of images
>>> input = torch.randn (5, in_channels, h, w)
>>> out_two_layer = fc1 (torch.flatten (conv2 (input), 1))
>>> out_collapsed = fcnew (torch.flatten (input, 1))
>>> torch.allclose (out_collapsed, out_two_layer, atol = 1.e-6)
True

Best.

K. Frank

1 Like