Does model.eval() & with torch.set_grad_enabled(is_train) have the same effect for grad history?


(Sebastian Raschka) #1

For the sake of the example, let’s say I don’t use Dropout, BatchNorm etc, just a plain CNN.

According to the docs (in PyTorch 0.4),

with torch.set_grad_enabled(is_train)

prevents tracking via autograd, which would make the inference mode more efficient (I assume). Now, if I would use model.eval(), would this have the same effect. E.g., does the following track gradients after model.eval()

model = CNN()
for e in num_epochs:
    # do training

# evaluate model:
model = model.eval()
logits, probas = model(testset_features)

or is it recommended, in addition, to do the following:

model = CNN()
for e in num_epochs:
    # do training

# evaluate model:
model = model.eval()
with torch.set_grad_enabled(False):
    logits, probas = model(testset_features)

Different gradients under the same condition when generating adversarial examples
(Irfan Bulu) #2

I think it is the latter. model.eval() has effect on dropout, batchnorm etc. You can use model.eval() in combination with with torch.no_grad() during inference phase.

from the docs:
Disabling gradient calculation is useful for inference, when you are sure that you will not call Tensor.backward(). It will reduce memory consumption for computations that would otherwise have requires_grad=True. In this mode, the result of every computation will have requires_grad=False, even when the inputs have requires_grad=True.


(Simon Wang) #3

eval doesn’t turn off history tracking.


(Sebastian Raschka) #4

thanks @Irfan_Bulu and @SimonW


(Shihab Shahriar) #5

I also assumed that eval() mode automatically turns off gradient computation. Hopefully you can see why this might be confusing for us newcomers. I would request to emphasize this point in docs at nn.Module’s eval() function. Actually apart from FAQ, an article pointing out common mistakes and confusions would be great. Thanks.