Should one set model.eval() when getting outputs in a training epoch

kpatil · July 30, 2020, 8:30am

When using randomized/batch-dependent layers like dropout or BN is is necessary to set model.eval() before getting the outputs for a batch within a training epoch?

The question also appears on stackoverflow: https://stackoverflow.com/questions/63167099/pytorch-training-with-dropout-and-or-batch-normalization

tom · July 30, 2020, 8:41am

I’d say it is a clear: No. You would keep the model in training state (model.train()) while training and only set it to model.eval() during evaluation (e.g. validation interleaved with training epochs, inference proper). Don’t for get to set it back to training state for the next epoch if you do.

I must admit I have trouble to see why the person helping you on stackoverflow would come to a different conclusion.

The other way round it would make more sense btw., for things like test time dropout, you would actually avoid setting the dropout layers to evaluation mode.

Best regards

Thomas

P.S.: I would advise against crossposting. If you get the same answer, it’ll be wasted time for people answering with no benefit to you. If you get different ones, you need to make up your mind which would have been the better forum to ask in anyways and it’ll be wasted time for people answering.

kpatil · July 30, 2020, 8:51am

Thanks for quick reply.
I do not get the point why a dropout layer should not to be set t eval mode. If it is not set to eval mode then the output for the same example can be different for different runs.

PS: as for cross-posting I think it’s not necessarily bad as the user base for different forums can be quite different. And its not necessarily my question on other forums.

tom · July 30, 2020, 9:08am

I can’t help but feel it wastes my time. But of course, I can just not answer questions, too.

Nikronic · July 30, 2020, 9:53am

I think you need to study the reason for introducing dropout/bn in NNs. Their job is to modify outputs in each iteration. Then if we disable dropouts before feedforward, why don’t we just remove them?