How do you calculate the channel or input to the first linear layer?

Here is a network and if you could please explain to me how the 128 * 1 * 1 shape is calculated I will appreciate it very much.

I am aware of this formula (W + F + 2P / S) + 1 but I am having trouble calculating128 * 1 * 1.

In this formula:
W = Input Width
F = Kernel size
P = Padding
S = Stride

The size of the input is (1,28,28) ie the MNIST dataset from torchvision.

So as you can see I have looked into this problem but I cannot calculate the 128 * 1 * 1 input to
self.f1 = nn.Linear(128 * 1 * 1, 1000) in the network below So, if you could answer this question using some formula I will appreciate it very much.

class Net(nn.Module):
    """A representation of a convolutional neural network comprised of VGG blocks."""
    def __init__(self, n_channels):
        super(Net, self).__init__()
        # VGG block 1
        self.conv1 = nn.Conv2d(n_channels, 64, (3,3))
        self.act1 = nn.ReLU()
        self.pool1 = nn.MaxPool2d((2,2), stride=(2,2))
        # VGG block 2
        self.conv2 = nn.Conv2d(64, 64, (3,3))
        self.act2 = nn.ReLU()
        self.pool2 = nn.MaxPool2d((2,2), stride=(2,2))
        # VGG block 3
        self.conv3 = nn.Conv2d(64, 128, (3,3))
        self.act3 = nn.ReLU()
        self.pool3 = nn.MaxPool2d((2,2), stride=(2,2))
        # Fully connected layer
        self.f1 = nn.Linear(128 * 1 * 1, 1000)
        self.act4 = nn.ReLU()
        # Output layer
        self.f2 = nn.Linear(1000, 10)
        self.act5 = nn.Softmax(dim=1)

    def forward(self, X):
        """This function forward propagates the input."""
        # VGG block 1
        X = self.conv1(X)
        X = self.act1(X)
        X = self.pool1(X)
        # VGG block 2
        X = self.conv2(X)
        X = self.act2(X)
        X = self.pool2(X)
        # VGG block 3
        X = self.conv3(X)
        X = self.act3(X)
        X = self.pool3(X)
        # Flatten
        X = X.view(-1, 128)
        # Fully connected layer
        X = self.f1(X)
        X = self.act4(X)
        # Output layer
        X = self.f2(X)
        X = self.act5(X)

        return X
        
        

thanks!

you need to provide the size of the input for that

i edited my question. the size is (1, 28, 28).

so (excluding the batch dimension):
input: 1X28X28
after conv2d: 64X28X28
after maxpool: 64X13X13
after conv2d: 64X11X11
after maxpool: 64X5X5
after conv2d: 128X3X3
after maxpool: 128X1X1

now you have tensor of size (N, 128X1X1) which you view as (N, 128) by flattening it.
That’s why the number of input features to the first linear layer is 128

thank you so much!

Just to make sure i would like to ask this. after the first pool you actually get 13.5, right? but you rounded off to 13? this is correct, right?

yes
look here:
https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html
you need to round down

thank you so much!

I appreciate it!

How it can be 642828 when we have kernel size 3,3.It will be 642626.Or I am wrong…