How to calculate dimensions of first linear layer of a CNN

Currently, I am working with a CNN where there is a fully connected layer attached to it and I am working with a 3 channel image of size 32x32. I am wondering on if there is a consistent formula I can use to calculate the input dimensions of the first linear layer with the input from the last conv/maxpooling layer. I want to be able to calculate the dimensions of the first linear layer given only information of the last conv2d layer and maxpool later. In other words, I would like to be able to calculate that value without having to use information of the previous layers before (so I don’t have to manually calculate weight dimensions of a very deep network)

I also want to understand the calculation of acceptable dimensions, like what would be the reasoning of those calculations?

For some reason these calculations work and Pytorch accepted these dimensions:

val = int((32*32)/4)
self.fc1 = nn.Linear(val, 200)

and this also worked

self.fc1 = nn.Linear(64*4*4, 200)

Why do those values work, and is there a limitation in the calculation of those methods? I feel like this would break if I were to change stride distance or kernel size, for example.

Here is the general model architecture I was working with:

# define the CNN architecture
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # convolutional layer
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, padding=1)
        # max pooling layer
        self.pool = nn.MaxPool2d(2, 2)  


        self.conv2 = nn.Conv2d(in_channels=16, out_channels=32,kernel_size=3)
        self.pool2 = nn.MaxPool2d(2,2)

        self.conv3 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3)
        self.pool3 = nn.MaxPool2d(2,2)
        
        self.dropout = nn.Dropout(0.25)

        # H*W/4
        val = int((32*32)/4)
        #self.fc1 = nn.Linear(64*4*4, 200)
        ################################################
        self.fc1 = nn.Linear(val, 200)  # dimensions of the layer I wish to calculate
        ###############################################
        self.fc2 = nn.Linear(200,100)
        self.fc3 = nn.Linear(100,10)


    def forward(self, x):
        # add sequence of convolutional and max pooling layers
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool2(F.relu(self.conv2(x)))
        x = self.pool3(F.relu(self.conv3(x)))
        #print(x.shape)
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = F.relu(self.fc2(x))
        x = self.dropout(x)
        x = self.fc3(x)

        return x

# create a complete CNN
model = Net()
print(model)

Can anyone tell me how to calculate the dimensions of the first linear layer and explain the reasoning?

In general, the documentation of each operation like convolution:
https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html
and maxpooling:
https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html
contains definitions for the shape calculations. However, using these still requires manual effort unless you write your own shape inference code.

In practice a simple hack that many use here is to simply print the shape before the linear layer runs, and to use the printed shape here. It’s a bit clunky, but afaict that is the simplest method at the moment. There may be better solutions in the future due to the move towards structured definitions that allow shape inference without actually running the code. rfcs/RFC-0005-structured-kernel-definitions.md at rfc-0005 · pytorch/rfcs · GitHub

As for both shapes working in your model; could you check that running the model actually works with both dimensions? I don’t think an error in dimensions would be caught until the model is actually run for the first time.

1 Like