'model.eval()' vs 'with torch.no_grad()'

(Dong Wook Kim) #1

When i test my model, do I have to use model.eval() even though I am using 'with torch.no_grad() ?

(Alban D) #2


These two have different goals:

  • model.eval() will notify all your layers that you are in eval mode, that way, batchnorm or dropout layers will work in eval model instead of training mode.
  • torch.no_grad() impacts the autograd engine and deactivate it. It will reduce memory usage and speed up computations but you won’t be able to backprop (which you don’t want in an eval script).

Derivative of two networks
Out of memory error during evaluation but training works fine!
(Dong Wook Kim) #3

Thank you very much for your quick and clear explanation.

Expected behavior Dropout?
(Jayakrishna Rudra) #4

Hey, this implies I should definitely do "model.eval" while validating.

And, if memory and speed are not constraints; "torch.no_grad()" can be ignored. Right?

(Naman Jain) #5

Ahh with torch.no_grad() you’ll have much higher speeds and can use larger validation batch sizes so it’s useful if not recommended

(Alban D) #6

Why do something that takes 5x more memory (the 5 here is for the example, not actual number in practice) and is slow, if you can just add one extra line to avoid it?
@Naman-ntc using torch.no_grad() is actually the recommended way to perform validation !

(Naman Jain) #7

yeah alright I meant to write “if not compulsory” :sweat_smile:

Different gradients under the same condition when generating adversarial examples
(Jayakrishna Rudra) #8

Thank you for the explanation!

(Kasper Fredenslund) #9

Does torch.no_grad() also disable dropout layers?

(Michael Klachko) #10

So why is torch.no_grad() is not enabled by default inside model.eval() function? Is there a situation where we want to compute some gradients when in evaluation mode? Even if that’s the case, it seems like no_grad() method should be made an optional argument to eval(), and set to True by default.

(Alban D) #11

@kasperfred No it does not.

@michaelklachko Some user can have a use case for this. The problem with doing this I guess is that no_grad is a context manager to work with the autograd engine while eval() is changing the state of an nn.Module.

(Victor Tan) #12

Hello, do you know how exactly the eval mode affect the dropout layer in the test? What are the differences of the dropout behavior between the eval and training mode?


During eval Dropout is deactivated and just passes its input.
During the training the probability p is used to drop activations. Also, the activations are scaled with 1./p as otherwise the expected values would differ between training and eval.

drop = nn.Dropout()
x = torch.ones(1, 10)

# Train mode (default after construction)

# Eval mode

(Victor Tan) #14

Thanks a lot. Your answer is the same as what I thought.

(Bram Vanroy) #15

Could someone please confirm whether this means that you handle evaluating and testing similarly? In both cases you set the model to .eval() and use with torch.no_grad()? (A bit more explanation as to why we treat them similarly is also welcome; I am a beginner.)

(Two Four) #16

In Dropout documentation, it says the probability p is used to drop activations. At the same time, the activations not be dropped are scaled with 1/(1-p), I am not sure why it uses 1/(1-p) as a factor to scale the activations, could you give some explanation?


Have a look at this post for an example why we are scaling the activations.
Note that the p in my explanation refert to the keep probability not the drop probability.

(Two Four) #19

Thanks for your explanation, now I am clear about that