Understanding Layer Sizes/Number of Channels

arundas · September 27, 2018, 9:26pm

Hi everyone!

I’m new to Pytorch, and I’m having some trouble understanding computing layer sizes/the number of channels works.

I’m currently looking at this code from a NN for the Fashion-MNIST dataset (this neural net is working on the Fashion MNIST data in batch sizes of 64, using SGD, running for 10 epochs).

def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2(x), 2))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)

My confusion is where the number “320” comes from, both in x.view(-1, 320), and as the first parameter in the first fully connected layer. How does one arrive at this number? What values contribute to it being 320?

Thank you! I look forward to exploring Pytorch much more over the coming months and years.

ptrblck · September 27, 2018, 10:53pm

This number is calculated using your layer parameters and the input size.
Let’s walk through the shapes after each layer.

Your input starts at [1, 28, 28].
Both convolution layers use a kernel size of 5, a stride of 1 and no padding.
This means the spatial size of the input activation will be reduced by 2 pixels on each size (so 4 in width and height).
Also, you are using max pooling with a kernel size and stride of 2, which halves the spatial size.

I’ll use your forward to calculate the shapes:

# x has an initial size of [1, 28, 28]
x = self.conv1(x)
# this conv layer reduces the size by 4, the channels are set to the number of kernels, so x is [10, 24, 24]
x = F.relu(F.max_pool2d(x, 2))
# pooling halves the spatial size with this setup, so x is [10, 12, 12]
x = self.conv2(x)
# again -4 in w and h, x is [20, 8, 8]
x = F.relu(F.max_pool2d(x, 2))
# x is now [20, 4, 4]
x = x.view(-1, 320)
# flattens x to [-1, 20*4*4=320]
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)

You can find a good introduction to conv layers and the shape calculation at Stanfrod’s CS231n .

arundas · September 27, 2018, 10:55pm

That makes so much sense - thank you so much @ptrblck!