The last nn.Softmax
layer could be wrong, if you are using nn.CrossEntropyLoss
or nn.NLLLoss
.
The former expects raw logits, so remove this activation, while the latter expects log probabilities, so use nn.LogSoftmax
instead.
The difference in the spatial size could come from a different padding setup in your conv layers.
Also, the quoted section doesn’t mention anything about the stride of the pooling layers, which would also influence the activation shape.