Input size of linear layer

I’m starting with CNN on MNIST dataset and I have a question: why must we have 128 in self.fc1 = nn.Linear(128, 4096). We have 32 filters in conv3, then in max_pool2D. The kernel size is 3 by 3, shouldn’t we have 32 x 3 x 3 = 288?

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, stride=(1, 1))
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=(1, 1))
        self.conv3 = nn.Conv2d(64, 32, kernel_size=3, stride=(1, 1), padding=1)
        self.fc1 = nn.Linear(128, 4096)
        self.fc2 = nn.Linear(4096, 10)
        
    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 3)
        x = F.relu(self.conv3(x))
        x = F.max_pool2d(x, 3)
        x = x.view(-1, 128)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x, dim = 0)

The kernel size of your pooling layers is odd, which makes to manual calculation sometimes a bit tedious.
You could just print the shape after each layer:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, stride=(1, 1))
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=(1, 1))
        self.conv3 = nn.Conv2d(64, 32, kernel_size=3, stride=(1, 1), padding=1)
        self.fc1 = nn.Linear(128, 4096)
        self.fc2 = nn.Linear(4096, 10)
        
    def forward(self, x):
        x = F.relu(self.conv1(x))  # 26
        print(x.shape)
        x = F.relu(self.conv2(x))  # 24
        print(x.shape)
        x = F.max_pool2d(x, 3)  # 8
        print(x.shape)
        x = F.relu(self.conv3(x))  # 8
        print(x.shape)
        x = F.max_pool2d(x, 3)  # 2
        print(x.shape)
        x = x.view(-1, 128)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x, dim = 0)

As you can see, you’ll get an activation of 32*2*2, which fits the number of input features.

1 Like

Hi Ptrblck

I hope you are well. I just want to double check about my input to the 3dimention CNN layer (cnn3d ) my patch is 11117 and i used 64 batch size and one Chanel input, the input size for training is (64,1,11,11,7) is it correct?
and for the fully connected layer without any CNN layer before the input for 64 batch size for the patch size of 11117 is (64,874). Indeed I used data.view(-1,11117) and then use model(Input) as a input of linear layer.

it is correct to say that if the input size is not correct for training definitely pytorch show the error?

Yes, PyTorch will raise an exception, if you encounter a shape mismatch, so if the code runs fine, the shapes are correct. :slight_smile:

Your calculation is also correct, although I prefer to write x = x.view(x.size(0), -1) just to make sure the batch size is always correct and I will get a shape mismatch in case I use the wrong number of input features.

Yes correct, and for the test since I test each patch individually, the input size for linear layer should be (1,864) and for CNN layer should be [1,1,11,11,7], like the thing that I used for training just now the batch size is 1

Yes, the batch dimension should always be there, even if you use a single sample.
You are currently changing the number of features: 874 in the previous post, now 864, while 11*11*7=847. Besides that, it should work. :wink:

Yes. I really appreciate you. it is the type mistake.
You help me alot