Worse performance when executing model.eval() than model.train()

My network performance on the test set gets much worse after some iterations when applying the model. eval() statement. However, if I do the same thing without having model. eval(), the performance is much better. Does anyone know how I can solve this problem?

2 Likes

could you share some code snippet?

thanks for your reply.
@1chimaruGin
So my network is almost like Unet.
and let’s say I train/evaluate my model in the following ways:

model.eval()
evaluate the model on test set
model.train()
train the model on train set

model.train()
evaluate model on test set
model.train()
train the model on train set

my batch size is 5.
So, the only difference is that in the second case, before evaluating the model on test set I don’t call model.eval().

In the second case I get a better performance. I have both dropout and batch norm in my network and I know that they behave differently if calling model.eval(). I think the problem is because of batchnorm and maybe running stats but I don’t know how to fix it.

Did you mean evaluation accuracy and loss?
I think if your model doesn’t include Dropout and BatchNorm, model.train() and model.eval() won’t be different. If not, each of them can effect the result. For me, evaluation with model.eval() seem legit. Correct me if I’m wrong.

Pytorch source code

def train(self: T, mode: bool = True) -> T:
        r"""Sets the module in training mode.

        This has any effect only on certain modules. See documentations of
        particular modules for details of their behaviors in training/evaluation
        mode, if they are affected, e.g. :class:`Dropout`, :class:`BatchNorm`,
        etc.

        Args:
            mode (bool): whether to set training mode (``True``) or evaluation
                         mode (``False``). Default: ``True``.

        Returns:
            Module: self
        """
        self.training = mode
        for module in self.children():
            module.train(mode)
        return self

    def eval(self: T) -> T:
        r"""Sets the module in evaluation mode.

        This has any effect only on certain modules. See documentations of
        particular modules for details of their behaviors in training/evaluation
        mode, if they are affected, e.g. :class:`Dropout`, :class:`BatchNorm`,
        etc.

        This is equivalent with :meth:`self.train(False) <torch.nn.Module.train>`.

        Returns:
            Module: self
        """
        return self.train(False)

Yes. I mean accuracy.

And yes, you are right. I have batch norm and drop out.

However, the problem is when I exclude model.eval() ( i.e., perform evaluation without executing model.eval() ) my performance is way better than when I do the evaluation after executing model.eval().

The accuracy when having model.eval() doesn’t make any sense. It’s like the network is not learning at all. However, when I remove model.eval() the accuracy on the test set actually makes sense.

Did you use multiple layers in your forward method with the same name from your init method?
e.g.
in init,
self.conv1 = nn.Conv2d(…)
in forward,

x = conv1(x)
x = conv1(x)

but instead do:
init,
self.conv1 = nn.conv2d(…)
self.conv2 = nn.conv2d(…)
forward,

x = conv1(x)
x = conv2(x)

I encountered a similar issue with eval() giving different results, try it

I recall a forum post that solved this issue, i believe it could be related to some pytorch internals

Quick question just to make sure, are you using torch.no_grad() or torch.zero_grad() for validation/test time when you remove model.eval()?

Also, you may want to take a look at these two discussions on this topic (1 and 2) in case you haven’t seen them.