Performance highly degraded when eval() is activated in the test phase

---- I am doing some experiments about regression problem using pytorch. (e.g., input a noised image and output a denoised image).

  1. In the training epoch, I first execute model.train().
  2. after each epoch, I do validation, and execute model.eval().
    I found the validation loss is normal (consistent with training loss) without calling model.eval(). However, the validation loss becomes much higher when executing model.eval(). So I guessed that this phenomenon may disappear in the test phase. Unfortunately, in the test phase, the performance is still bad when calling model.eval(), while is normal without doing it.

I also tried some other CNN tools, such as matconvnet and tensorflow, they both work fine, and the performance is better when calling ´model.eval()´ in the test phase. (I think it is related to the batchnorm module in my network, I hope for a higher performance when using running average and variance, but the results are opposite)

So anyone can help me on this? Thanks in advance.

Here is my network.

class Net(nn.Module):
    def __init__(self):
        self.layers = 17
        super(Net, self).__init__()
        self.layer_m = self.make_h_layers()
        self.layer_f = nn.Conv2d(64, 1, 3, padding=1)

    def make_h_layers(self):
        layer_1 = nn.Sequential(nn.Conv2d(1, 64, 3, padding=1), nn.ReLU(inplace=True))
        layer_m = nn.Sequential(nn.Conv2d(64, 64, 3, padding=1), nn.BatchNorm2d(64), nn.ReLU(inplace=True))
        layers = []
        for i in range(self.layers-2):
        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.layer_m(x)
        x = self.layer_f(x)
        return x


This is very likely to be caused by the BatchNorm layer. BatchNorm computes a running mean and variance that is used during prediction.

For some reason, the running mean/variance are not a good approximation of the true distribution of the activations during prediction. Maybe the batch size you used was to small, maybe the default momentum parameter for the running mean/variance (0.1 IIRC) for BatchNorm is too large in your case.

@smb thanks. But it seems the problems is not caused by the issues you mentioned.

  1. currently the batch size is 128, i have also tried 256, the result is similar.
  2. I set the momentum 0.01, but does not help.

I also believed this is caused by BatchNorm layer, when i drop out the model.eval() in the validation phase, the validation loss is very similar to training loss (i used the MSE loss). However, when model.eval() is activated, the performance becomes very bad, i found sometimes the performance becomes even worse and worse as the epoch increases (quite interesting).

So could list some more potential reasons?


Did you solve this problem? I run into the same problem these days…Use model.train() got very high performance even in the first iteration during the test time, but significance drop when use model.eval()…

If you are performing a train step followed by an ‘if step % ___ == 0’ condition to evaluate a test batch in that same step, YOU MUST CALL model.train() in the next step BEFORE any model param/gradient is updated, this must include optimizer.zero_grad( ).


This answer helps me a lot! I use the optimizer.zero_grad() before mode.train() by mistake. Now the problem is solved. Thanks!

@xiaoxiaolishan Hi, I encountered the same problem. Have you found the solution to this problem? Thanks!

Have you solved the problem? I run into the same problem here

I encountered the same problem here, test performance even gets worse and worse as the epoch increases, I’v set model.train() before optimizer.zero_grad(), same results.

Hi, have you sovled the problem? I encounter this problem …

Have encountered the same problem, cannot understand why does this happen. If @Soumith_Chintala or somebody from the community could help, would be great!

I have also tried weight decay & dropout, considering the problem as a case of over-fitting, yet the problem persists.

Hi, have you solved this problem? my model overfits and using model.eval() with dropout is not helping

I’m getting this with lesser minibatches. Especially when training on several GPUs.

same problem here. hope someone could answer

I also ran to the same problem, for my case the model uses batch norm heavily in different layers. I poked around the code, I think the problem might come from track_running_stats ( line 64 in this file: ).

I solved the problem by setting the track_running_stats=False for all batch norm layers in the model. This is due to a bug I think in line 64 of the above-mentioned file.

if you want the model to work in eval mode, simply set the track_running_stats to False for all batch norm layers:

for child in model.children():
    for ii in range(len(child)):
        if type(child[ii])==nn.BatchNorm2d:
            child[ii].track_running_stats = False

Guys, this is not a bug. You are getting unstable estimates with small batch size. This is natural.

1 Like

How do you explain it not working when eval() is set then ?

Exactly what I said above: with a small batch size, running average estimate is unstable. So you get worse results using that.

I have got the same exact problem in the 1d convolutional network in pytorch 0.4.1. Batch size is 128 so it shouldn’t be connected to small batch size. Disabling the eval() fixes the problem.

Removing batchnorm fixes the problem. It must be that batchnorm is the problem. But would this be a problem once you save the model, load it and use it in production where batch size can be 1?

1 Like

Another common mistake that I’ve seen is to re-use the same BN layer in different places of the network. Is it possible that this is the reason? If not, could you share the model definition with us?