Hi!
So to my understanding, if i want to change the mode of operation for the dropout units,all i need to do is net.eval() when testing/validating and net.train(True) when training. Is that true?
If so, I am super confused: after adding dropout layers, the loss on the training is consistently HIGHER than the loss on the validation set (per example),and it seems to be by a factor of 2 (I’ve used p=0.5 at the dropout layers).
Before adding the dropout layers my net overfitted the data,but in the first epochs the loss was more or less the same.
(If that makes any difference, the criterion is crossentropy, and I didn’t specify a softmax layer since to my understanding it is not needed as crossentropy’s inputs should be the scores and not the logits
an example output:
[14, 6000] loss: 0.084
valid loss is 0.05202618276541386 and percents 0.9190970103721782
The first line is the training loss (per sample) and the second line is the validation loss…
I’m super confused, help will be appreciated!