In_features , out_features in nn.Linear

I have a question regarding nn.Linnar.

In the documentation it says:

in_features – size of each input sample
out_features – size of each output sample

As an example we have the following model:

model_cnn = nn.Sequential(nn.Conv2d(1, 32, 3, padding=1), nn.ReLU(),
                          nn.Conv2d(32, 32, 3, padding=1, stride=2), nn.ReLU(),
                          nn.Conv2d(32, 64, 3, padding=1), nn.ReLU(),
                          nn.Conv2d(64, 64, 3, padding=1, stride=2), nn.ReLU(),
                          nn.Linear(7*7*64, 100), nn.ReLU(),
                          nn.Linear(100, 10)).to(device)

Where do the values (7*7*64, 100) in the first nn.Linear and (100, 10) in the second nn.Linear come from?
I can’t figure out how these values are calculated.

It is the shape of the features after coming out of the final conv2d layer, it is a tensor of shape (64, 7, 7). The Flatten() layer will just flatten these features into a 1 dimensional tensor, which will be of shape (64*7*7,), which is what get’s inputted into the Linear layer


The output of your last Conv2d would be like (N, 64, 7, 7), where N stands for batch_size, 64 for number of channels and 7x7, the height and width of the image.
So, now Flatten() will convert this into shape (N, 64 x 7 x 7). Now, when it will go to the first Linear, the output will be (N, 100) and after second Linear (N, 10).

Please do ask if you are still confused…

Thank you for the explanation.
But how do we know that the size of the image is 7x7 in the final conv2d?

Oh! See, I use a trick. Before using the linear or the flatten layer, you run the model on a dummy sample by passing say torch.randn(32, 3, 60, 60), where 32 is the batch_size, 3 is the input num_channels and 60x60 is the dimension of the images. The output you get will have a shape of (N, out_channels, height, width). So, this is how you can get the output of the last conv2d layer, otherwise you have to calculate the shape manually.

OK, thank you for the explanation.