Dear Community,
i encountered some none-intuitive behaviour. I load a model, set it to evaluation mode and predict a single image using model(input). Then, I set the model to training mode and predict a single image again. (Note that there is no .backward performed, I even disabled requires_grad for all parameters). When I set the model to evaluation mode again, and predict the same image from before, I get a different result.
TL;DR: when you eval an input before and after a training where no weight updates have been performed, the prediction in the evaluation phase changes.
Is this a bug or am I missing something here? You can run this code to check it yourself:
from torchvision import models
import torch
def set_parameter_requires_grad(model):
for param in model.parameters():
param.requires_grad = False
if __name__ == "__main__":
model = models.densenet121(pretrained=True)
set_parameter_requires_grad(model)
model.train()
input_ = torch.zeros((1,3, 224, 224))
some_var = model(input_)
model.eval()
eval_value = model(input_)
model.train()
another_variable = model(input_)
model.eval()
eval_value_2 = model(input_)
print(eval_value[0,0:3])
print(eval_value_2[0,0:3])
output:
tensor([-0.3295, 0.2167, -0.6806])
tensor([-0.5839, 0.4981, -0.4104])
Edit: Itâs not a dropout issue, the dropout_rate was 0 all along.
densenet121
uses batchnorm layers, which will update their running estimates during training in each forward pass.
During evaluation these running estimates will then be applied instead of the batch statistics, which explains the difference in your outputs.
1 Like
thanks ptrblck for the explanation. I thought the batchnorm layers get disabled completely during evaluation.
Cheers
Hi! what exactly do the model in model.train() vs model.eval()? with and without gradient computation?
model.train()
and model.eval()
will switch the internal self.training
flag, which would then change the behavior of some layers. E.g. dropout layers will be disabled during evaluation and batchnorm layers will use the running stats instead of the batch statistics to normalize the activation. The gradient computation will not be changed or disabled.
1 Like
Thanks for your answer! What about the combination of the two commands (both âself.training options with/without gradient)?
I am a little bit confuse with the possibility of using back-propagation in the evaluation mode
Backpropagation and thus the gradient calculation will also work after calling model.eval()
but as previously described the forward pass will be different. E.g. while the batchnorm layers will use the running_mean
and running_var
to normalize the data, the affine parameters (weight
and bias
) will still be trained and will get gradients.
I am using AlexNet for TL, extracting the features in different layers, E.g. Fc7:
layer = 'Fc7'
alexNet = models.alexnet(pretrained=True)
new_classifier = nn.Sequential(*list(alexNet.classifier.children())[:-1])
alexNet.classifier = new_classifier
#alexNet.eval()
I noticed the following:
-
I only see the effect of activating/deactivating the gradient calculation at the output of this last layer (in model.train()
), is it because it is required to reach the end of the network to perform the loss calculation and then do the backpropagation?
-
Using model.eval()
the results are the same with and without the gradient activation, why?
Could you explain to me why the gradient calculation does not influence the output tensors?
There wouldnât be a reason why the gradient computation should influence the model without any updates as seen here:
model = nn.Linear(1, 1)
x = torch.randn(1, 1)
for _ in range(10):
out = model(x)
print(out)
out.backward() # compute gradients
Thank you so much! Your answers have been very helpful