Training model in eval() mode gives better result ?!

tset0401 · December 1, 2018, 5:15am

Hi every one, recently I encountered a strange issue and i don’t really understand it.
I have a CNN model with Dropout layers (with p=0.5). I intented to train and evaluate model at every epochs.
Case 1: I didn’t set model.train() for train phase, set model.eval() for evaluation phase -> model is in train mode for only the first epoch, and in eval mode for the rest (it’s just a mistake)
Case 2: set model.train() for train phase, model.eval() for test phase (this is what we tend to do)

But when I check, in case 1, the loss on train set reduces more quickly and far smaller than in case 2 (0.006 vs 0.028). Case 1 also gives better macro-F1 score (about 17%). So can I say “training model in eval mode gives better result” ?!
Can someone explain it?

rasbt · December 1, 2018, 8:27am

Training model in eval() mode gives better result ?!

Well, that just means that your model trains better without dropout. What you can try to do is to increase the layer size or number of layers, or decrease the dropout proba, to see if that helps.

SimonW · December 1, 2018, 9:10am

Looking at training set loss doesn’t mean anything really, and this may even be expected since dropout is often considered as a way to prevent overfitting…

tset0401 · December 1, 2018, 1:07pm

thank you for your reply, I didn’t think carefully when I was asking this, but when you say “try to increase the layer size or number of layers, or decrease the dropout proba”, you mean that I should keep dropout in my model for better result, instead of removing it (in case 1, gives better performance)?

vmirly1 · December 1, 2018, 4:31pm

model.eval() sets the Dropout (and Normalization layers, if any) in evaluation mode. The default mode is for training. In the trining mode, the elements of input tensors are zeroed with probability p, but in the evaluation mode all elements are used as is. So, when you put the model in the eval() model, then the Dropout layer becomes ineffective in training, so it’s just like removing it entirely or setting p=0.

Removing Dropout layer is expected to lower the training loss, since the elements of input tensors are not randomly zeroed. However, removing Dropout may result in poor performance on the unseen test set, since now the model is more prone to overfitting.

Therefore, you should consider treating the Dropout parameter as a hyperparameter, and you can tune it by studying the performance of a validation set.

rasbt · December 1, 2018, 5:42pm

Kind of both :). What I mean is in your particular architecture, Dropout may not work well, which is why you get a better test performance if you training a model without dropout (i.e., model.eval() disables dropout during training).

Usually, (as per Hinton; dropout came from his group), for dropout you want to make your neural network slightly overcomplex (the capacity is larger than needed for a given problem). So, I was suggesting to add parameters to your network and then try to see if that, together with dropout, gives you even better results than training the network you have without dropout. Also, maybe your dropout is too strong for the given architecture, and like @vmirly1 suggested, I would also try smaller dropouts probas.

tset0401 · December 2, 2018, 12:33am

it’s quite clear, thanks again :))

tset0401 · December 2, 2018, 12:37am

btw, I also use BatchNorm in my model, does eval mode disable it too? As I know it just disable tracking running mean and variance, BN still adjust inputs with current minibatch’s mean and variance

vmirly1 · December 2, 2018, 12:50am

For BatchNormalization, there are certain parameters (means and variance) that are learned during training. These parameters are sued to scale/shift the hidden layers to have zero mean and unit variance, to eliminating the internal covariance shift.

Now, by changing the mode to .eval() you basically disable the learning process of those parameters during training of you network. As a result, at the testing phase, your network has poor estimates of these means and variances and therefore, will scale/shift poorly.

So, in summary, if you don’t need the Dropout and BatchNorm just remove them from you model. Changing the mode to .eval() may result in unexpected behaviour.