# Determine input dimension of image?

Following this TRAINING A CLASSIFIER

dimension of the images being trained are `32x32x3` =`height x width x channel`

I am unable to find where in this neural network class the height and width are mentioned, i am confused.

``````import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)

def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x

net = Net()
``````

Hi,

As you are using convolutional filters, you don t have to specify the input height and with. Yet, as at the end of your classifiers there are linear layers, you can actually retreive the input size from the first linear layer input size which is here 16x5x5.

Here is the reasonning I use:
5x5 convolutional filters without padding and with stride=1 change size from HxWxC (layer input size) to (H-4)x(W-4)xC’ (layer output size)
2x2 max pooling layers devide the size of both dimensions by 2 so we go from HxWxC (layer input size) to H/2xW/2xC (layer output size)

Hence we just have to propagate back those informations from the first linear unit input size.

Since last conv layer has 16 filters, you know the output of the last conv layer has size: 5x5x16, so it goes as following inversing the transformations listed above:
32x32x3
28x28x6
14x14x6
10x10x16
5x5x16

There you go.

One last this to notice is that I used HxWxC notations while Pytorch convention is CxHxW but reasonning remains unchanged.

To conclude, the height, width information are implicit in this code and if you use fully convolutionnal network, not even necessary.

1 Like

Here is what i could deduce, the architecture has to be understood before programming or training on pytorch.

Input image dimension `HxWxC` = `n x n x c`
filter `f x f x cf`
Output dimension = `((n+2p-f)/s)+1`
`p` = padding
`s` = stride

here `p=0` and `s=1`(default)
Here in this case
`32x32x3` is the input image convolution with `5x5x6` filter which gives the output of `28x28x6`
Maxpooling same equation for output dimension
`28x28x6` is the input image , convolution with `2x2x6` filter which gives the output of `14x14x6` IMPORTANT: STRIDE IN MAXPOOLING EQUALS THE FILTER/KERNEL SIZE
Same for other `convolution` and `Maxpooling`
Convolution O/P = `10x10x16`
Maxpooling O/P = `5x5x16`

Hope it helps someone 2 Likes

It seems correct. Good luck.