Size confusion in this PyTorch Tutorial

I am following the Pytorch tutorial on

Inside the init for Net class, we have the first fully connected layer specified as
self.fc1 = nn.Linear(16 * 5 * 5, 120)

I am confused where the 5*5 comes from.

If we start from the beginning with a CIFAR-10 image, which is 32x32, and we use the equation
width_out = (width_in - kernel_size + 2*padding)/Stride + 1
Since the image is square and the filter is square, the height_out equation is identical.

Applying the first convolution with 5x5 filter on this gives 28x28 because (32 - 5)+1=28.
Applying the 2x2 with Stride = 2 pooling, gives 14x14 because (28-2)/2+1 = 14
Applying the 2nd convolution with 5x5 filter on this gives 10x10 because (14-5)+1 = 10

This is the part where I am getting lost. To go from 10x10 to 5x5, you’d need to applying another 2x2 with stride 2 pooling, but I don’t see another pooling defined in init.

Could someone explain o me what is going on here?

If we look at documentation:

torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)

Be carefull to first 3 parameters. If we follow the tutorial:

input size:   32x32x3
After conv1   28x28x6
After pooling 14x14x6
After conv2   10x10x16
After pooling 5x5x16

After Convolutional layers, we have a matrix whose size is 5x5x16 instead 32x32x3. It mean that input of fully connected layers (fc) is a vector whose size is 5x5x16. If we flatten the matrix, of course we have (1655 x 1) matrix.

Since there is no learning in pooling layers. You can use pooling layers in different parts. There is just 1 creation self.pool, but it is used twice.

Yes, I understood most of this, but I don’t understand what you mean by the last part “There is just 1 creation self.pool, but it is used twice.” Right below the “self.conv1” line, they have the “self.pool” line. Are you saying this line defines a pooling layer after every convolution layer?

Sorry for late reply. No, it is defines 1 times here in __init__(self). But both 2 of convolutional layers use pooling layers which have same kernel sizes. For example, if you want to use 2x2 for first one and 3x3 for 2. one, you should define another layer which is self.pool2 = nn.MaxPool2d(3, 3). Hence, it is unreasonable to create self.pool = nn.MaxPool2d(2, 2) twice. There is no learning, just reducing dimensions, so define it just one times, and if your pooling layers have same sizes, use that pooling layer more then ones times.

self.conv1 = nn.Conv2d(3, 6, 5)
self.pool1 = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.pool2 = nn.MaxPool2d(2, 2)

here pool1 and pool2 are same.

1 Like

In the forward process, self.pool is used twice to obtain 16x5x5 feature map. Init is the construction function, where many functions are defined, including self.pool pooling function.

Sorry all. Huge mistake on my part. I got 2 source codes mixed up. I was actually looking at an older version of the source code than the one I linked to you guys. In the one I was looking at the constructor had it as

def __init__(self):
    super(Net, self).__init__()
    self.conv1 = nn.Conv2d(3, 6, 5)
    **self.pool1 = nn.MaxPool2d(2, 2)**
    self.conv2 = nn.Conv2d(6, 16, 5)
    self.fc1 = nn.Linear(16 * 5 * 5, 120)
    self.fc2 = nn.Linear(120, 84)
    self.fc3 = nn.Linear(84, 10)

the “1” on the self.pool1 confused me, but I was looking at an outdated version.