I am well aware that a model can be set to train or eval mode and that layers like dropout and batchnorm behave differently depending on this switch.
Recently, however, I have seen repos where they train a model(resnet152 to be specific) in eval mode - i.e the model is set to eval and then trained. Is it legit to do this(they also get sota results for their domain of interest)? if yes, in what situations would one take this approach?
Do you know, if the model is set to eval() from the beginning of the training?
If that’s the case and if only dropout and batchnorm layers are used, I believe you could also remove these layers, since they wouldn’t change anything.
I.e. dropout is disabled, so basically acting as an nn.Identity layer. while batchnorm layers are initialized in a way to not change the incoming activation without training (up to floating point precision):
I do not think I understand your comment If that’s the case and if only dropout and batchnorm layers are used.. did you mean ..NOT used? in eval() mode my understanding was that batchnorm uses stats (m/v) which is stored and the dropout is disabled, meaning they are NOT used. is my understanding correct?
I meant: if the only layers in the model, which change their behavior during training/evaluation, are dropout and batchnorm layers, setting model.eval() at the beginning of the script would disable them (as you’ve also said) and you could thus just remove the layers.
However, custom layers can also use the self.training flag internally and thus switch their behavior, but I don’t know if that’s the case in this repository.
I don’t understand this, as I thought model.eval() is already used during training and evaluation.
I meant: if the only layers in the model, which change their behavior during training/evaluation, are dropout and batchnorm layers, setting model.eval() at the beginning of the script would disable them (as you’ve also said) and you could thus just remove the layers.
However, custom layers can also use the self.training flag internally and thus switch their behavior, but I don’t know if that’s the case in this repository.
Ah ok I understand now! Thank you.
I don’t understand this, as I thought model.eval() is already used during training and evaluation.
def run_epoch(model, train_loader, optimizer, center, device):
model.train() #this by default was model.eval() according to repo
total_loss, total_num = 0.0, 0
Ah OK, thanks for the explanation.
In that case, could you run an additional experiment by removing all dropout and batchnorm layers from the model (replace them with nn.Identity), call model.train(), and retrain the model?
This should be the same run as using model.eval() during training and I would thus expect to see approx. the same training behavior (assuming it’s stable).
@ptrblck I have a follow up question: so, assuming there is no custom layers in this repo (and I can confirm there isnt - it is a vanilla pretrained model), is it legit to train under eval mode?
The model is just this:
class Model(torch.nn.Module):
def __init__(self):
super().__init__()
self.backbone = models.resnet152(pretrained=True)
self.backbone.fc = torch.nn.Identity()
freeze_parameters(self.backbone, train_fc=False)
def forward(self, x):
z1 = self.backbone(x)
z_n = F.normalize(z1, dim=-1)
return z_n
def freeze_parameters(model, train_fc=False):
for p in model.conv1.parameters():
p.requires_grad = False
for p in model.bn1.parameters():
p.requires_grad = False
for p in model.layer1.parameters():
p.requires_grad = False
for p in model.layer2.parameters():
p.requires_grad = False
if not train_fc:
for p in model.fc.parameters():
p.requires_grad = False
Sure, it could be valid to train in eval() and thus disabling the aforementioned layers. It would still be interesting to see if a removal of dropout and batchnorm layers yield the same behavior or if some training/eval logic is still hidden somewhere else in the script.