Problem about nn.Linear(16 * 5 * 5, 120)

hhh · January 22, 2019, 9:02am

Hello,everyone,
I am new to PyTorch, and I feel confused about the reason of 16×5×5 in the codes below. Can anyone tell me why we choose 16×5×5 here? Thank you:)

import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # an affine operation: y = Wx + b
        **self.fc1 = nn.Linear(16 * 5 * 5, 120)**
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


net = Net()
input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)

ptrblck · January 22, 2019, 9:07am

The number of input features to your linear layer is defined by the dimensions of your activation coming from the previous layer.
In your case the activation would have the shape [batch_size, channels=16, height=5, width=5]. In order to pass this activation to nn.Linear, you are flattening this tensor to [batch_size, 16*5*5].
The 16 is defined by the number of out_channels (i.e. number of filter kernels in the previous conv layer), while 5x5 is the spatial size defined by the conv and pooling operations performed on your input data.

Let me know, if you need more information!

hhh · January 22, 2019, 9:19am

Thank you! Is the meaning of 5×5 the same as 28×28 in the MINST example? If so, why 5×5 here?

hhh · January 22, 2019, 2:32pm

Got it! Because the input is 1×32×32, after the conv and pooling operations, the size becomes 16×5×5. Thank you very much for your help!

jhoanmartinez · April 12, 2022, 7:24pm

Is the out_channels * kernel_size[width] * kernel_size[hight] from the previous layer right?

ptrblck · April 12, 2022, 10:48pm

Yes, the out_channels are defined by the previous layer. No, the kernel size of the previous layer won’t be used directly, but the spatial size of the output activation of the previous layer.