Correct dimension of pytorch layers

Hi,
I’m trying to change the used size of channels at layer4 of resnet 18, I wrote something like the follwoing piece of code. but when I run it complains about the size mismatch at the fc layer, I don’t know exactly how should I realize which size should I use.

class Inception(nn.Module):

def __init__(self, in_channels=2048):
    super(Inception, self).__init__()

    self.paral_0 = nn.Sequential(
        nn.Conv2d(256, 64, kernel_size=3, stride=2, padding=1, bias=False),
        nn.BatchNorm2d(64),
        nn.ReLU(inplace=True),			
        nn.Conv2d(64, 64, kernel_size=3, stride=1,  padding=1, bias=False),
        nn.BatchNorm2d(64),
  	
        nn.Conv2d(64, 64, kernel_size=3, stride=1,  padding=1, bias=False),
        nn.BatchNorm2d(64),
        nn.ReLU(inplace=True),			
        nn.Conv2d(64, 64, kernel_size=3, stride=1,  padding=1,  bias=False),
        nn.BatchNorm2d(64),
  	
        nn.AdaptiveAvgPool2d(output_size=(1, 1))
    )
    self.fc0    = nn.Linear(in_features=256, out_features=83, bias=True)
    ....

def forward(self, x):

    y0 = self.paral_0(x)
    y0 = y0.view(y0.size(0), -1)
    y0 = self.fc0(y0)
....

and this is the error:

Traceback (most recent call last):
File " generated_train/0.py", line 471, in
summary(model, (3, 224, 224))
File " lib/python3.6/site-packages/torchsummary/torchsummary.py", line 72, in summary
model(*x)
File " lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File " generated_train/model/net.py", line 533, in forward
x = self.layer4(x)
File " lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File " generated_train/model/net.py", line 303, in forward
y0 = self.fc0(y0)
File " lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File " lib/python3.6/site-packages/torch/nn/modules/linear.py", line 67, in forward
return F.linear(input, self.weight, self.bias)
File " lib/python3.6/site-packages/torch/nn/functional.py", line 1352, in linear
ret = torch.addmm(torch.jit._unwrap_optional(bias), input, weight.t())
RuntimeError: size mismatch, m1: [2 x 64], m2: [256 x 83] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:266

If calculating the shapes is too cumbersome, you could just print the shape of y0 before passing it to the linear layer:

print(y0.shape)
y0 = self.fc0(y0)

After the next crash, you’ll get the shape of your tensor and can adapt the number of input features in your layer accordingly.

well, it prints the size as torch.Size([2, 64]), what should I change the input feature into? I tried different combinations and all give a similar error. should I use reshape?

in_features should be set to 64 to match the input.

yes apparently 64 was its magic number, thanks

I got a little confused, why the input features of Linear layer depends on previous layers ? isn’t it the case that each neuron at FC layer will connect to all neurons at previous layers, so why it would matter what is the size of FC layer.

Could you run the code on the CPU and see if you get a proper error message?

That’s basically right. Since PyTorch is a dynamic framework, you have to specify the in and out features of the linear layer, so that the weight matrix can be properly initialized.

I understand this part that we should specify the input feature size, but I don’t understand why it should be specific, like in my case why it should be 64 ? isn’t it against the definition of FC layers as I mentioned earlier ? did I miss something here

Static frameworks can infer the input size based on the computation graph.
I.e. if we know which activation will be passed to the linear layer in advance, we can set the number of input features based on the activation shape.
However, as the forward pass is not known in advance in PyTorch and the computation graph is dynamically created in each forward pass, we cannot pre-compute the input shape without losing the flexibility of a dynamic approach.

Anyway, a linear layer is basically a matrix multiplication and an addition. The matrix size has to be set before the actual forward pass takes place.