How do you calculate the channel or input to the first linear layer?

xsquared · February 18, 2021, 10:13pm

Here is a network and if you could please explain to me how the 128 * 1 * 1 shape is calculated I will appreciate it very much.

I am aware of this formula (W + F + 2P / S) + 1 but I am having trouble calculating128 * 1 * 1.

In this formula:
W = Input Width
F = Kernel size
P = Padding
S = Stride

The size of the input is (1,28,28) ie the MNIST dataset from torchvision.

So as you can see I have looked into this problem but I cannot calculate the 128 * 1 * 1 input to
self.f1 = nn.Linear(128 * 1 * 1, 1000) in the network below So, if you could answer this question using some formula I will appreciate it very much.

class Net(nn.Module):
    """A representation of a convolutional neural network comprised of VGG blocks."""
    def __init__(self, n_channels):
        super(Net, self).__init__()
        # VGG block 1
        self.conv1 = nn.Conv2d(n_channels, 64, (3,3))
        self.act1 = nn.ReLU()
        self.pool1 = nn.MaxPool2d((2,2), stride=(2,2))
        # VGG block 2
        self.conv2 = nn.Conv2d(64, 64, (3,3))
        self.act2 = nn.ReLU()
        self.pool2 = nn.MaxPool2d((2,2), stride=(2,2))
        # VGG block 3
        self.conv3 = nn.Conv2d(64, 128, (3,3))
        self.act3 = nn.ReLU()
        self.pool3 = nn.MaxPool2d((2,2), stride=(2,2))
        # Fully connected layer
        self.f1 = nn.Linear(128 * 1 * 1, 1000)
        self.act4 = nn.ReLU()
        # Output layer
        self.f2 = nn.Linear(1000, 10)
        self.act5 = nn.Softmax(dim=1)

    def forward(self, X):
        """This function forward propagates the input."""
        # VGG block 1
        X = self.conv1(X)
        X = self.act1(X)
        X = self.pool1(X)
        # VGG block 2
        X = self.conv2(X)
        X = self.act2(X)
        X = self.pool2(X)
        # VGG block 3
        X = self.conv3(X)
        X = self.act3(X)
        X = self.pool3(X)
        # Flatten
        X = X.view(-1, 128)
        # Fully connected layer
        X = self.f1(X)
        X = self.act4(X)
        # Output layer
        X = self.f2(X)
        X = self.act5(X)

        return X

thanks!

nitaifingerhut · February 18, 2021, 10:20pm

you need to provide the size of the input for that

xsquared · February 18, 2021, 10:34pm

i edited my question. the size is (1, 28, 28).

nitaifingerhut · February 18, 2021, 10:54pm

so (excluding the batch dimension):
input: 1X28X28
after conv2d: 64X28X28
after maxpool: 64X13X13
after conv2d: 64X11X11
after maxpool: 64X5X5
after conv2d: 128X3X3
after maxpool: 128X1X1

now you have tensor of size (N, 128X1X1) which you view as (N, 128) by flattening it.
That’s why the number of input features to the first linear layer is 128

xsquared · February 18, 2021, 11:19pm

thank you so much!

Just to make sure i would like to ask this. after the first pool you actually get 13.5, right? but you rounded off to 13? this is correct, right?

nitaifingerhut · February 18, 2021, 11:23pm

yes
look here:
https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html
you need to round down

xsquared · February 18, 2021, 11:27pm

thank you so much!

I appreciate it!

BRAHMA_REDDY_AKUMALL · November 16, 2023, 7:17am

How it can be 642828 when we have kernel size 3,3.It will be 642626.Or I am wrong…