The network is worse in train than in eval

Someone could explain why the network has this behavior?

I share a toy example:

import torch
import torch.nn as nn

class ConvBN(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride, padding):
        super(ConvBN, self).__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding, bias=False)
        self.bn = nn.BatchNorm2d(out_channels)

    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        return x

torch.manual_seed(0)

model = ConvBN(3, 64, 3, 1, 1)
model.cuda()
input_tensor = torch.randn(1, 3, 224, 224).cuda()

for n in range(8):
    if n % 2 == 0:
        model.eval()
    else:
        model.train()

    for i in range(10):
        output_tensor = model(input_tensor)

        if n % 2 == 0:
            print(f"Eval: {output_tensor.mean() * 100:.6f}")
        else:
            print(f"Train: {output_tensor.mean() * 100:.6f}")

I don’t see any evaluation metrics in this toy examples; what does “worse” mean here?

.eval() turns off batchnorm and dropout layers. These are kind of like the equivalent of wearing 10kg weights while exercising.

But when performing, you don’t want these training tools still holding back the model’s performance. So it’s expected .train() will perform worse than .eval().

Ok I got it. I just give a little bit of context why I’m asking this.

I’m solving a 3d robot pose estimation task in a Sim2Real domain.

I have a network (modified HRNet-32) pretrained on a synthetic dataset that predicts a 3D pose of a robot. When I’m testing the network on the real domain in eval() mode everything it’s fine and I get good results. However, I’m trying to do a finetuning of the network using an adversarial domain adaptation technique that exploits the predictions of the pretrained network as pseudo labels for the real domain.

Here is the problem:
during the finetuning step, I have the network in train() mode, but the predicted pseudo labels are not even near to the predictions of the eval() mode. This basically brakes the domain adaptation training. The network contains just Conv2D, BatchNorm2D and ReLU activations. I know the batch norm acts differently between train() and eval() mode, but do you have any idea on how to deal with this type of problem while finetuning a network?

Sounds like it’s not a problem with the model but with your training method.

Hi @alexj94,

Your source code doesn’t contain nn.ReLU. Could this be why it doesn’t perform well in .eval() mode?