The input of a Pytorch Neural Network is of type [BATCH_SIZE] * [CHANNEL_NUMBER] * [HEIGHT] * [WIDTH].
Example : So lets assume you image is of dimension 1×3×32×32 meaning that you have 1 image with 3 channels (RGB) with height 32 and width 32. So using the formular of convolution which is ((W - F + 2P)/ S )+1
and ((H - F + 2P)/ S )+1 . The first one is for the Width and the second one is for the height
NOTE : Delete the the 5th line of the code because you already have pooling in the 12th line
W = WIDTH
F = FILTER_SIZE
P = PADDIND
S = STRIDE
with our input 1×3×32×32 after applying conv1 W will be 28 and H will be 28 and also applying (2,2) pooling halves the WIDTH and HEIGTH and We have 6 feature maps. So after the first con2d and pooling we end up with an image of dimension
1 * 6 * 14 * 14. Similaly for the second conv2d and pooling we will end up with an image of dimension 1 * 16 * 5 * 5 . Finally since we need a column vector for the first fc layer we should unroll our vector which is 16×5×5 = 400
NOTE : Refer to this post which is a similar question
[Linear layer input neurons number calculation after conv2d]