Hi Bernardo!
Two comments to start:
First, for context, yes, because there is no intervening nonlinearity between
conv2
and fc
, the two may be replaced by (collapsed into) a single Linear
.
This is because a Linear
is the most general linear (technically speaking,
affine) transformation that has the given numbers of input and output
variables. So whatever net linear (technically affine) transformation conv2
and fc1
generate when applied sequentially (without an intervening
nonlinearity), this transformation may be exactly reproduced (up to some
numerical round-off error) by a single Linear
.
Second, why do you even want to start with a separate conv2
and fc1
?
Your desired fcnew
contains fewer individual parameters than do conv2
and fc1
together (largely because of the relatively large number of
out_channels
in conv2
that then just get combined back together by fc1
).
My intuition is that the single fcnew
layer will train more efficiently than
conv2
and fc1
, so why not just train fcnew
from scratch, rather than build
fcnew
from the weights of some pre-existing conv2
and fc1
?
Having said that, probably the easiest way to collapse conv2
and fc1
together – that is, to compute the weights of your fcnew
layer – will be
to pass a series of “single-pixel” images through conv2
and fc1
. The
values of the pixels of the “output” images so obtained will be, roughly
speaking, the individual values in fcnew
’s .weight
tensor.
Consider:
>>> import torch
>>> torch.__version__
'1.13.0'
>>>
>>> _ = torch.manual_seed (2023)
>>>
>>> # shape of intermediate "input" image
>>> in_channels = 5
>>> h = 13
>>> w = h
>>>
>>> # convolution parameters
>>> out_channels = 50
>>> kernel = 5
>>> stride = 2
>>>
>>> # fully-connected parameters
>>> in_features = int (out_channels * (((h - kernel) / 2) + 1)**2)
>>> out_features = 100
>>>
>>> # create layers to collapse
>>> conv2 = torch.nn.Conv2d (in_channels, out_channels, kernel, stride)
>>> fc1 = torch.nn.Linear (in_features, out_features)
>>>
>>> # create collapsed bias from conv2 and fc1
>>> bias = fc1 (torch.flatten (conv2 (torch.zeros (1, in_channels, h, w))))
>>>
>>> # create collapsed weight from conv2 and fc1 (and bias)
>>> # batch of images, each with only a single pixel turned on
>>> n_pixels = in_channels * h * w # number of pixels (including channels) in input image
>>> pixel_batch = torch.eye (n_pixels).reshape (n_pixels, in_channels, h, w)
>>> weight = (fc1 (torch.flatten (conv2 (pixel_batch), 1)) - bias).T
>>>
>>> # create collapsed Linear
>>> fcnew = torch.nn.Linear (n_pixels, out_features) # Linear of correct shape
>>> # copy in collapsed weight and bias
>>> with torch.no_grad():
... _ = fcnew.weight.copy_ (weight)
... _ = fcnew.bias.copy_ (bias)
...
>>> # check on example batch of images
>>> input = torch.randn (5, in_channels, h, w)
>>> out_two_layer = fc1 (torch.flatten (conv2 (input), 1))
>>> out_collapsed = fcnew (torch.flatten (input, 1))
>>> torch.allclose (out_collapsed, out_two_layer, atol = 1.e-6)
True
Best.
K. Frank