For the sake of the example, let’s say I don’t use Dropout, BatchNorm etc, just a plain CNN.
According to the docs (in PyTorch 0.4),
with torch.set_grad_enabled(is_train)
prevents tracking via autograd, which would make the inference mode more efficient (I assume). Now, if I would use model.eval(), would this have the same effect. E.g., does the following track gradients after model.eval()
model = CNN()
for e in num_epochs:
# do training
# evaluate model:
model = model.eval()
logits, probas = model(testset_features)
or is it recommended, in addition, to do the following:
model = CNN()
for e in num_epochs:
# do training
# evaluate model:
model = model.eval()
with torch.set_grad_enabled(False):
logits, probas = model(testset_features)
I think it is the latter. model.eval() has effect on dropout, batchnorm etc. You can use model.eval() in combination with with torch.no_grad() during inference phase.
from the docs:
Disabling gradient calculation is useful for inference, when you are sure that you will not call Tensor.backward(). It will reduce memory consumption for computations that would otherwise have requires_grad=True. In this mode, the result of every computation will have requires_grad=False, even when the inputs have requires_grad=True.
I also assumed that eval() mode automatically turns off gradient computation. Hopefully you can see why this might be confusing for us newcomers. I would request to emphasize this point in docs at nn.Module’s eval() function. Actually apart from FAQ, an article pointing out common mistakes and confusions would be great. Thanks.