.eval() turns off batchnorm and dropout layers. These are kind of like the equivalent of wearing 10kg weights while exercising.
But when performing, you don’t want these training tools still holding back the model’s performance. So it’s expected .train() will perform worse than .eval().
Ok I got it. I just give a little bit of context why I’m asking this.
I’m solving a 3d robot pose estimation task in a Sim2Real domain.
I have a network (modified HRNet-32) pretrained on a synthetic dataset that predicts a 3D pose of a robot. When I’m testing the network on the real domain in eval() mode everything it’s fine and I get good results. However, I’m trying to do a finetuning of the network using an adversarial domain adaptation technique that exploits the predictions of the pretrained network as pseudo labels for the real domain.
Here is the problem:
during the finetuning step, I have the network in train() mode, but the predicted pseudo labels are not even near to the predictions of the eval() mode. This basically brakes the domain adaptation training. The network contains just Conv2D, BatchNorm2D and ReLU activations. I know the batch norm acts differently between train() and eval() mode, but do you have any idea on how to deal with this type of problem while finetuning a network?