Following this TRAINING A CLASSIFIER
dimension of the images being trained are
height x width x channel
I am unable to find where in this neural network class the height and width are mentioned, i am confused.
import torch.nn as nn
import torch.nn.functional as F
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
net = Net()
As you are using convolutional filters, you don t have to specify the input height and with. Yet, as at the end of your classifiers there are linear layers, you can actually retreive the input size from the first linear layer input size which is here 16x5x5.
Here is the reasonning I use:
5x5 convolutional filters without padding and with stride=1 change size from HxWxC (layer input size) to (H-4)x(W-4)xC’ (layer output size)
2x2 max pooling layers devide the size of both dimensions by 2 so we go from HxWxC (layer input size) to H/2xW/2xC (layer output size)
Hence we just have to propagate back those informations from the first linear unit input size.
Since last conv layer has 16 filters, you know the output of the last conv layer has size: 5x5x16, so it goes as following inversing the transformations listed above:
There you go.
One last this to notice is that I used HxWxC notations while Pytorch convention is CxHxW but reasonning remains unchanged.
To conclude, the height, width information are implicit in this code and if you use fully convolutionnal network, not even necessary.
@el_samou_samou Yes your are correct.
Here is what i could deduce, the architecture has to be understood before programming or training on pytorch.
Input image dimension
n x n x c
f x f x cf
Output dimension =
p = padding
s = stride
Here in this case
32x32x3 is the input image convolution with
5x5x6 filter which gives the output of
Maxpooling same equation for output dimension
28x28x6 is the input image , convolution with
2x2x6 filter which gives the output of
14x14x6 IMPORTANT: STRIDE IN MAXPOOLING EQUALS THE FILTER/KERNEL SIZE
Same for other
Convolution O/P =
Maxpooling O/P =
Hope it helps someone
It seems correct. Good luck.