I am following this tutorial: https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html
it’s quite nicely done, however I do not understand/see where you can know the expected image input size for the small network they have defined. They say that the images must be of size 32x32.
I know there is a formula (given in stanford CS231n ) that states that ((N-K)/strideLen) +1 = , where N is the (padded) input size and K is the 2D convolution kernel size (e.g. 3x3).
But that does not seem to help me deduce what I want.
How can I know the expect input size ? I would be happy if someone whats to go in more details for the example of this simple PyTorch conv nn.
(Please note that for sizes significanlty different that 32x32 the net outputs an error message. Also, I don’t understand why it accepts inputs like 30x30 and 31x31, e.g. this is accepted:
input = torch.randn(1, 1, 31, 31) out = net(input) print(out)
The code is:
import torch import torch.nn as nn import torch.nn.functional as F class Net(nn.Module): def __init__(self): super(Net, self).__init__() # 1 input image channel, 6 output channels, 3x3 square convolution # kernel self.conv1 = nn.Conv2d(1, 6, 3) self.conv2 = nn.Conv2d(6, 16, 3) # an affine operation: y = Wx + b self.fc1 = nn.Linear(16 * 6 * 6, 120) # 6*6 from image dimension self.fc2 = nn.Linear(120, 84) self.fc3 = nn.Linear(84, 10) def forward(self, x): # Max pooling over a (2, 2) window x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2)) # If the size is a square you can only specify a single number x = F.max_pool2d(F.relu(self.conv2(x)), 2) x = x.view(-1, self.num_flat_features(x)) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x def num_flat_features(self, x): size = x.size()[1:] # all dimensions except the batch dimension num_features = 1 for s in size: num_features *= s return num_features net = Net() print(net)
It is not explained where from the shape (16 * 6 * 6, 120) comes from, for the linear layers, e.g. why 120 ?