Pytorch's official mnist example: Loss on test set way less than loss on training set! (so sth has to be wrong)

Hi!

So while simply running this official example:


that’s the output from the last epoch:

Train Epoch: 10 [55040/60000 (92%)]	Loss: 0.185965
Train Epoch: 10 [55680/60000 (93%)]	Loss: 0.099982
Train Epoch: 10 [56320/60000 (94%)]	Loss: 0.271109
Train Epoch: 10 [56960/60000 (95%)]	Loss: 0.049256
Train Epoch: 10 [57600/60000 (96%)]	Loss: 0.384411
Train Epoch: 10 [58240/60000 (97%)]	Loss: 0.182649
Train Epoch: 10 [58880/60000 (98%)]	Loss: 0.374920
Train Epoch: 10 [59520/60000 (99%)]	Loss: 0.307496

Test set: Average loss: 0.0486, Accuracy: 9838/10000 (98%)

So as you can see, the average loss on the test set is about an order of magnitude less than the loss on the training set.
That is impossible and should be the other way around (it should either overfit or otherwise be roughly the same).

What’s going on and where is the bug? (I literally ran the official example unchanged).

1 Like

During training the model uses Dropout, at test time it doesn’t. Dropout will make the training loss worse.

If you remove the dropout layer then the two losses should be more similar.

2 Likes

Ah didn’t notice the dropout unit,thanks!