Training a model under eval mode

I am well aware that a model can be set to train or eval mode and that layers like dropout and batchnorm behave differently depending on this switch.

Recently, however, I have seen repos where they train a model(resnet152 to be specific) in eval mode - i.e the model is set to eval and then trained. Is it legit to do this(they also get sota results for their domain of interest)? if yes, in what situations would one take this approach?

Do you know, if the model is set to eval() from the beginning of the training?
If that’s the case and if only dropout and batchnorm layers are used, I believe you could also remove these layers, since they wouldn’t change anything.
I.e. dropout is disabled, so basically acting as an nn.Identity layer. while batchnorm layers are initialized in a way to not change the incoming activation without training (up to floating point precision):

bn = nn.BatchNorm2d(3)
bn.weight
> Parameter containing:
tensor([1., 1., 1.], requires_grad=True)
bn.bias
> Parameter containing:
tensor([0., 0., 0.], requires_grad=True)
bn.running_mean
> tensor([0., 0., 0.])
bn.running_var
> tensor([1., 1., 1.])

x = torch.randn(2, 3, 24, 24)
bn.eval()
out = bn(x)
print((out - x).abs().max())
> tensor(1.6689e-05, grad_fn=<MaxBackward1>)

Hello @ptrblck thank you for your response! To answer your question: yes, the model is set to eval() in the beginning of training. See here for example: Mean-Shifted-Anomaly-Detection/main.py at a02ac3001f96fcfa2f9b6a5fcb58cb3e9ba7f5b0 · talreiss/Mean-Shifted-Anomaly-Detection · GitHub

What is perplexing is that when I change the eval just for the training (Mean-Shifted-Anomaly-Detection/main.py at a02ac3001f96fcfa2f9b6a5fcb58cb3e9ba7f5b0 · talreiss/Mean-Shifted-Anomaly-Detection · GitHub) I get worser results. They show that using their approach they get SOTA results on CIFAR-10, so I am trying to understand how this eval() thing can be applied to other architectures.

I do not think I understand your comment If that’s the case and if only dropout and batchnorm layers are used.. did you mean ..NOT used? in eval() mode my understanding was that batchnorm uses stats (m/v) which is stored and the dropout is disabled, meaning they are NOT used. is my understanding correct?

Thank you again.

I meant: if the only layers in the model, which change their behavior during training/evaluation, are dropout and batchnorm layers, setting model.eval() at the beginning of the script would disable them (as you’ve also said) and you could thus just remove the layers.
However, custom layers can also use the self.training flag internally and thus switch their behavior, but I don’t know if that’s the case in this repository.

I don’t understand this, as I thought model.eval() is already used during training and evaluation.

I meant: if the only layers in the model, which change their behavior during training/evaluation, are dropout and batchnorm layers, setting model.eval() at the beginning of the script would disable them (as you’ve also said) and you could thus just remove the layers.
However, custom layers can also use the self.training flag internally and thus switch their behavior, but I don’t know if that’s the case in this repository.

Ah ok I understand now! Thank you.

I don’t understand this, as I thought model.eval() is already used during training and evaluation.

Sorry I was not clear - what I had meant was, i switched the eval() flag to train() flag for the training part of the code i.e here: Mean-Shifted-Anomaly-Detection/main.py at a02ac3001f96fcfa2f9b6a5fcb58cb3e9ba7f5b0 · talreiss/Mean-Shifted-Anomaly-Detection · GitHub then I get worser results. Hope I am making myself clear. So basically I do something like this:

def run_epoch(model, train_loader, optimizer, center, device):
    model.train() #this by default was model.eval() according to repo
    total_loss, total_num = 0.0, 0

Ah OK, thanks for the explanation.
In that case, could you run an additional experiment by removing all dropout and batchnorm layers from the model (replace them with nn.Identity), call model.train(), and retrain the model?
This should be the same run as using model.eval() during training and I would thus expect to see approx. the same training behavior (assuming it’s stable).

1 Like

thanks @ptrblck I will try your suggestion and report back here.

@ptrblck I have a follow up question: so, assuming there is no custom layers in this repo (and I can confirm there isnt - it is a vanilla pretrained model), is it legit to train under eval mode?

The model is just this:

class Model(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.backbone = models.resnet152(pretrained=True)
        self.backbone.fc = torch.nn.Identity()
        freeze_parameters(self.backbone, train_fc=False)

    def forward(self, x):
        z1 = self.backbone(x)
        z_n = F.normalize(z1, dim=-1)
        return z_n

def freeze_parameters(model, train_fc=False):
    for p in model.conv1.parameters():
        p.requires_grad = False
    for p in model.bn1.parameters():
        p.requires_grad = False
    for p in model.layer1.parameters():
        p.requires_grad = False
    for p in model.layer2.parameters():
        p.requires_grad = False
    if not train_fc:
        for p in model.fc.parameters():
            p.requires_grad = False

Sure, it could be valid to train in eval() and thus disabling the aforementioned layers. It would still be interesting to see if a removal of dropout and batchnorm layers yield the same behavior or if some training/eval logic is still hidden somewhere else in the script.

1 Like