Following this TRAINING A CLASSIFIER
dimension of the images being trained are 32x32x3
=height x width x channel
I am unable to find where in this neural network class the height and width are mentioned, i am confused.
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
Hi,
As you are using convolutional filters, you don t have to specify the input height and with. Yet, as at the end of your classifiers there are linear layers, you can actually retreive the input size from the first linear layer input size which is here 16x5x5.
Here is the reasonning I use:
5x5 convolutional filters without padding and with stride=1 change size from HxWxC (layer input size) to (H-4)x(W-4)xC’ (layer output size)
2x2 max pooling layers devide the size of both dimensions by 2 so we go from HxWxC (layer input size) to H/2xW/2xC (layer output size)
Hence we just have to propagate back those informations from the first linear unit input size.
Since last conv layer has 16 filters, you know the output of the last conv layer has size: 5x5x16, so it goes as following inversing the transformations listed above:
32x32x3
28x28x6
14x14x6
10x10x16
5x5x16
There you go.
One last this to notice is that I used HxWxC notations while Pytorch convention is CxHxW but reasonning remains unchanged.
To conclude, the height, width information are implicit in this code and if you use fully convolutionnal network, not even necessary.
1 Like
@el_samou_samou Yes your are correct.
Here is what i could deduce, the architecture has to be understood before programming or training on pytorch.
Input image dimension HxWxC
= n x n x c
filter f x f x cf
Output dimension = ((n+2p-f)/s)+1
p
= padding
s
= stride
here p=0
and s=1
(default)
Here in this case
32x32x3
is the input image convolution with 5x5x6
filter which gives the output of 28x28x6
Maxpooling same equation for output dimension
28x28x6
is the input image , convolution with 2x2x6
filter which gives the output of 14x14x6
IMPORTANT: STRIDE IN MAXPOOLING EQUALS THE FILTER/KERNEL SIZE
Same for other convolution
and Maxpooling
Convolution O/P = 10x10x16
Maxpooling O/P = 5x5x16
Hope it helps someone
2 Likes
It seems correct. Good luck.