Consider a model like the one in https://github.com/pytorch/examples/blob/master/mnist/main.py (I take this example just to help to formulate the problem).
The model have a dropout so If I calculate the loss at the training it will not reflect the prediction model loss status for training since that will be without dropout.
From the other side calculating the more faithfull training loss will be computationally very expensive.
I wonder how people deal with that.
Dropout has a training mode and a test mode. The loss of a network with dropout in test mode should be about the same, on average, as dropout in training mode. I’m not saying that training and test losses should be the same; I’m saying that, using a network trained with dropout, taking the mean of the loss should yield about the same value whether you turn on dropout or you turn it off (and by turning it off I mean, putting it in test mode).
I hope that makes sense.
Thanks Carl! When you talk about average you mean calculate the loss on al the batch and then divide for the number of batch. So you are suggesting that calculating it with drop out off or on does not make a significant difference ?
That is what he is saying. Dropout should not change the loss much when averaged over entire batches.
Besides, if your testing loss keeps going down, you must be doing something right.
Thanks, I see your point still I still have the feeling that lefting dropout on in trening could make an underestiamate of an overfeeting problem. I got the felling that killing part of the weight can lower the loss in significant way. If that is not the case I would curious to know the reason behind this heuristics.