PyTorch neural network parameters and tensor shapes

I am following this tutorial: https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html

it’s quite nicely done, however I do not understand/see where you can know the expected image input size for the small network they have defined. They say that the images must be of size 32x32.

I know there is a formula (given in stanford CS231n ) that states that ((N-K)/strideLen) +1 = , where N is the (padded) input size and K is the 2D convolution kernel size (e.g. 3x3).

But that does not seem to help me deduce what I want.

How can I know the expect input size ? I would be happy if someone whats to go in more details for the example of this simple PyTorch conv nn.

(Please note that for sizes significanlty different that 32x32 the net outputs an error message. Also, I don’t understand why it accepts inputs like 30x30 and 31x31, e.g. this is accepted:

input = torch.randn(1, 1, 31, 31)
out = net(input)
print(out)

The code is:

import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 3x3 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


net = Net()
print(net)

It is not explained where from the shape (16 * 6 * 6, 120) comes from, for the linear layers, e.g. why 120 ?

Based on the description in CS231n, we know, that a conv layer with a kernel size of 3 and no padding will reduce the spatial size by ones pixel on each side.
Max pooling with a kernel size and stride of 2 will halve the spatial size.

Let’s have a look at the model and split the layers to calculate the shape based on these assumptions.
Have a look at the comments to see the output shape:

class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 3x3 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # assuming input is [N, 1, 32, 32]
        x = F.relu(self.conv1(x)) # outputs [N, 6, 30, 30]
        x = F.max_pool2d(x, (2, 2)) # outputs [N, 6, 15, 15]
        x = F.relu(self.conv2(x)) # outputs [N, 16, 13, 13]
        x = F.max_pool2d(x, 2) # outputs [N, 16, 6, 6]
        x = x.view(-1, self.num_flat_features(x)) # flattens x to [N, 16*6*6]
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

For an input of 31x31 and 30x30, we’ll get the following shapes:

# For x = [N, 1, 31, 31]:
[N, 6, 29, 29]
[N, 6, 14, 14]
[N, 16, 12, 12]
[N, 16, 6, 6 ]

# For x = [N, 1, 30, 30]:
[N, 6, 28, 28]
[N, 6, 14, 14]
[N, 16, 12, 12]
[N, 16, 6, 6 ]

So we got lucky, that the final result is the same. :slight_smile:
As you can see, the reason for the equal output shapes is that the pooling layer uses floor for odd input shapes as described in the docs.
You can change it by setting ceil_mode=True, if necessary.

1 Like

Great answer, thanks!

May I just also ask :

net = Net()
input = torch.randn(1, 1, 32, 32)
out = net(input)

Why is the method forward() not explicitely called? I mean how does just calling net(output) calls forward() ? (which is what happens as far as I understand)

By the way I dont understand what this line means:

super(Net, self).__init__()

I can imagine super() is calling the constructor of a parent class but …?

Yes, you are right. If you directly call the nn.Module, its internal __call__ method will be called, which makes sure to call hooks etc. and finally calls the forward method.
You should stick to this approach and not call model.forward manually.

That’s again right. This call initializes the parent class as seen here which then initializes all necessary dicts etc. for the parameters, buffers, …