'model.eval()' vs 'with torch.no_grad()'

Hi,

These two have different goals:

  • model.eval() will notify all your layers that you are in eval mode, that way, batchnorm or dropout layers will work in eval mode instead of training mode.
  • torch.no_grad() impacts the autograd engine and deactivate it. It will reduce memory usage and speed up computations but you won’t be able to backprop (which you don’t want in an eval script).
290 Likes

Thank you very much for your quick and clear explanation.

Hey, this implies I should definitely do "model.eval" while validating.

And, if memory and speed are not constraints; "torch.no_grad()" can be ignored. Right?

2 Likes

Ahh with torch.no_grad() you’ll have much higher speeds and can use larger validation batch sizes so it’s useful if not recommended

2 Likes

Why do something that takes 5x more memory (the 5 here is for the example, not actual number in practice) and is slow, if you can just add one extra line to avoid it?
@Naman-ntc using torch.no_grad() is actually the recommended way to perform validation !

29 Likes

yeah alright I meant to write “if not compulsory” :sweat_smile:

Thank you for the explanation!

Does torch.no_grad() also disable dropout layers?

4 Likes

So why is torch.no_grad() is not enabled by default inside model.eval() function? Is there a situation where we want to compute some gradients when in evaluation mode? Even if that’s the case, it seems like no_grad() method should be made an optional argument to eval(), and set to True by default.

32 Likes

@kasperfred No it does not.

@michaelklachko Some user can have a use case for this. The problem with doing this I guess is that no_grad is a context manager to work with the autograd engine while eval() is changing the state of an nn.Module.

10 Likes

Hello, do you know how exactly the eval mode affect the dropout layer in the test? What are the differences of the dropout behavior between the eval and training mode?

During eval Dropout is deactivated and just passes its input.
During the training the probability p is used to drop activations. Also, the activations are scaled with 1./p as otherwise the expected values would differ between training and eval.

drop = nn.Dropout()
x = torch.ones(1, 10)

# Train mode (default after construction)
drop.train()
print(drop(x))

# Eval mode
drop.eval()
print(drop(x))
31 Likes

Thanks a lot. Your answer is the same as what I thought.

Could someone please confirm whether this means that you handle evaluating and testing similarly? In both cases you set the model to .eval() and use with torch.no_grad()? (A bit more explanation as to why we treat them similarly is also welcome; I am a beginner.)

2 Likes

In Dropout documentation, it says the probability p is used to drop activations. At the same time, the activations not be dropped are scaled with 1/(1-p), I am not sure why it uses 1/(1-p) as a factor to scale the activations, could you give some explanation?

Have a look at this post for an example why we are scaling the activations.
Note that the p in my explanation refert to the keep probability not the drop probability.

4 Likes

Thanks for your explanation, now I am clear about that

hi, are you sure bn and dropout work in eval model? I think bn and dropout work in trainning mode, not working in validation and test mode.

Hi,

There is no such thing as “test mode”.
Only train() and eval().
Both bn and dropout will work in both cases but will have different behaviour as you expect them to have different behaviours during training and evaluation. For example, during evaluation, dropout should be disabled and so is replaced with a no op. Similarly, bn should use saved statistics instead of batch data and so that’s what it’s doing in eval mode.

7 Likes

You might want to modify your response as it can easily confuse readers. Your comment says “batchnorm or dropout layers will work in eval model instead of training mode.” I think you wanted to write eval mode, not eval model.