Where do the values (7*7*64, 100) in the first nn.Linear and (100, 10) in the second nn.Linear come from?
I can’t figure out how these values are calculated.
It is the shape of the features after coming out of the final conv2d layer, it is a tensor of shape (64, 7, 7). The Flatten() layer will just flatten these features into a 1 dimensional tensor, which will be of shape (64*7*7,), which is what get’s inputted into the Linear layer
The output of your last Conv2d would be like (N, 64, 7, 7), where N stands for batch_size, 64 for number of channels and 7x7, the height and width of the image.
So, now Flatten() will convert this into shape (N, 64 x 7 x 7). Now, when it will go to the first Linear, the output will be (N, 100) and after second Linear (N, 10).
Oh! See, I use a trick. Before using the linear or the flatten layer, you run the model on a dummy sample by passing say torch.randn(32, 3, 60, 60), where 32 is the batch_size, 3 is the input num_channels and 60x60 is the dimension of the images. The output you get will have a shape of (N, out_channels, height, width). So, this is how you can get the output of the last conv2d layer, otherwise you have to calculate the shape manually.