Understanding Layer Sizes/Number of Channels

Hi everyone!

I’m new to Pytorch, and I’m having some trouble understanding computing layer sizes/the number of channels works.

I’m currently looking at this code from a NN for the Fashion-MNIST dataset (this neural net is working on the Fashion MNIST data in batch sizes of 64, using SGD, running for 10 epochs).

def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2(x), 2))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)

My confusion is where the number “320” comes from, both in x.view(-1, 320), and as the first parameter in the first fully connected layer. How does one arrive at this number? What values contribute to it being 320?

Thank you! I look forward to exploring Pytorch much more over the coming months and years.

This number is calculated using your layer parameters and the input size.
Let’s walk through the shapes after each layer.

Your input starts at [1, 28, 28].
Both convolution layers use a kernel size of 5, a stride of 1 and no padding.
This means the spatial size of the input activation will be reduced by 2 pixels on each size (so 4 in width and height).
Also, you are using max pooling with a kernel size and stride of 2, which halves the spatial size.

I’ll use your forward to calculate the shapes:

# x has an initial size of [1, 28, 28]
x = self.conv1(x)
# this conv layer reduces the size by 4, the channels are set to the number of kernels, so x is [10, 24, 24]
x = F.relu(F.max_pool2d(x, 2))
# pooling halves the spatial size with this setup, so x is [10, 12, 12]
x = self.conv2(x)
# again -4 in w and h, x is [20, 8, 8]
x = F.relu(F.max_pool2d(x, 2))
# x is now [20, 4, 4]
x = x.view(-1, 320)
# flattens x to [-1, 20*4*4=320]
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)

You can find a good introduction to conv layers and the shape calculation at Stanfrod’s CS231n .

3 Likes

That makes so much sense - thank you so much @ptrblck!

1 Like