Currently, I am working with a CNN where there is a fully connected layer attached to it and I am working with a 3 channel image of size 32x32. I am wondering on if there is a consistent formula I can use to calculate the input dimensions of the first linear layer with the input from the last conv/maxpooling layer. I want to be able to calculate the dimensions of the first linear layer given only information of the last conv2d layer and maxpool later. In other words, I would like to be able to calculate that value without having to use information of the previous layers before (so I don’t have to manually calculate weight dimensions of a very deep network)
I also want to understand the calculation of acceptable dimensions, like what would be the reasoning of those calculations?
For some reason these calculations work and Pytorch accepted these dimensions:
val = int((32*32)/4)
self.fc1 = nn.Linear(val, 200)
and this also worked
self.fc1 = nn.Linear(64*4*4, 200)
Why do those values work, and is there a limitation in the calculation of those methods? I feel like this would break if I were to change stride distance or kernel size, for example.
Here is the general model architecture I was working with:
# define the CNN architecture
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# convolutional layer
self.conv1 = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, padding=1)
# max pooling layer
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(in_channels=16, out_channels=32,kernel_size=3)
self.pool2 = nn.MaxPool2d(2,2)
self.conv3 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3)
self.pool3 = nn.MaxPool2d(2,2)
self.dropout = nn.Dropout(0.25)
# H*W/4
val = int((32*32)/4)
#self.fc1 = nn.Linear(64*4*4, 200)
################################################
self.fc1 = nn.Linear(val, 200) # dimensions of the layer I wish to calculate
###############################################
self.fc2 = nn.Linear(200,100)
self.fc3 = nn.Linear(100,10)
def forward(self, x):
# add sequence of convolutional and max pooling layers
x = self.pool(F.relu(self.conv1(x)))
x = self.pool2(F.relu(self.conv2(x)))
x = self.pool3(F.relu(self.conv3(x)))
#print(x.shape)
x = torch.flatten(x, 1) # flatten all dimensions except batch
x = F.relu(self.fc1(x))
x = self.dropout(x)
x = F.relu(self.fc2(x))
x = self.dropout(x)
x = self.fc3(x)
return x
# create a complete CNN
model = Net()
print(model)
Can anyone tell me how to calculate the dimensions of the first linear layer and explain the reasoning?