Hello. I am working on a project to develop a CNN model for image classification. It is required to flatten the image after the convolutional layers, in order to pass it to the fully connected layers. I am using the following formula to calculate the required input shape to the fully connected layers:
Where W is the width of the output image, F is the kernel/filter size, P represents the padding, and S is the stride. My original image size is 119, and my convolutional layers are as follows:
- Conv Layer (W=119, F=5, P=2, S=1) (Size = 119)
- Max Pooling Layer(F=2) (Size = 59 - Correct me if I’m wrong here)
- Conv Layer(W=59, F=5, P=2, S=1) (Size = 59)
Going through the above equation, the size of the output image after going through all the layers should be 59*59. Also note that the output channels from the 2nd Conv Layer are 24, so the input vector to the fully connected layer should be 24x59x59 = 83544. However, the actual output, from the error, seems to be 24x29x29=20184. Attached are my convolutional layers:
self.conv1 = nn.Conv2d(in_channels=3, out_channels=12, kernel_size=5, stride=1, padding=2)
self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
self.conv2 = nn.Conv2d(in_channels=12, out_channels=24, kernel_size=5, stride=1, padding=2)
self.drop1 = nn.Dropout(p=0.5)
self.fc1 = nn.Linear(24*59*59, params['n_unit'])
self.drop2 = nn.Dropout(p=0.1)
self.fc2 = nn.Linear(params['n_unit'],num_classes)
Where did I go wrong in the calculations?