'model.eval()' vs 'with torch.no_grad()'

spnova12 · June 13, 2018, 8:52am

When i test my model, do I have to use model.eval() even though I am using 'with torch.no_grad() ?

albanD · June 13, 2018, 9:14am

Hi,

These two have different goals:

model.eval() will notify all your layers that you are in eval mode, that way, batchnorm or dropout layers will work in eval mode instead of training mode.
torch.no_grad() impacts the autograd engine and deactivate it. It will reduce memory usage and speed up computations but you won’t be able to backprop (which you don’t want in an eval script).

spnova12 · June 13, 2018, 9:23am

Thank you very much for your quick and clear explanation.

Jk749 · June 13, 2018, 9:49am

Hey, this implies I should definitely do "model.eval" while validating.

And, if memory and speed are not constraints; "torch.no_grad()" can be ignored. Right?

Naman-ntc · June 13, 2018, 9:50am

Ahh with torch.no_grad() you’ll have much higher speeds and can use larger validation batch sizes so it’s useful if not recommended

albanD · June 13, 2018, 9:52am

Why do something that takes 5x more memory (the 5 here is for the example, not actual number in practice) and is slow, if you can just add one extra line to avoid it?
@Naman-ntc using torch.no_grad() is actually the recommended way to perform validation !

Naman-ntc · June 13, 2018, 9:59am

yeah alright I meant to write “if not compulsory”

Jk749 · June 13, 2018, 10:03am

Thank you for the explanation!

kasperfred · August 13, 2018, 11:01pm

Does torch.no_grad() also disable dropout layers?

michaelklachko · August 16, 2018, 4:07pm

So why is torch.no_grad() is not enabled by default inside model.eval() function? Is there a situation where we want to compute some gradients when in evaluation mode? Even if that’s the case, it seems like no_grad() method should be made an optional argument to eval(), and set to True by default.

albanD · August 20, 2018, 10:18pm

@kasperfred No it does not.

@michaelklachko Some user can have a use case for this. The problem with doing this I guess is that no_grad is a context manager to work with the autograd engine while eval() is changing the state of an nn.Module.

Victor_Tan · September 27, 2018, 6:30pm

Hello, do you know how exactly the eval mode affect the dropout layer in the test? What are the differences of the dropout behavior between the eval and training mode?

ptrblck · September 27, 2018, 10:01pm

During eval Dropout is deactivated and just passes its input.
During the training the probability p is used to drop activations. Also, the activations are scaled with 1./p as otherwise the expected values would differ between training and eval.

drop = nn.Dropout()
x = torch.ones(1, 10)

# Train mode (default after construction)
drop.train()
print(drop(x))

# Eval mode
drop.eval()
print(drop(x))

Victor_Tan · September 27, 2018, 11:00pm

Thanks a lot. Your answer is the same as what I thought.

BramVanroy · December 30, 2018, 11:56pm

Could someone please confirm whether this means that you handle evaluating and testing similarly? In both cases you set the model to .eval() and use with torch.no_grad()? (A bit more explanation as to why we treat them similarly is also welcome; I am a beginner.)

two_four · January 17, 2019, 7:22am

In Dropout documentation, it says the probability p is used to drop activations. At the same time, the activations not be dropped are scaled with 1/(1-p), I am not sure why it uses 1/(1-p) as a factor to scale the activations, could you give some explanation?

ptrblck · January 17, 2019, 9:14pm

Have a look at this post for an example why we are scaling the activations.
Note that the p in my explanation refert to the keep probability not the drop probability.

two_four · January 18, 2019, 2:36am

Thanks for your explanation, now I am clear about that

yilong · March 2, 2019, 3:54am

hi, are you sure bn and dropout work in eval model? I think bn and dropout work in trainning mode, not working in validation and test mode.

albanD · March 4, 2019, 10:11am

Hi,

There is no such thing as “test mode”.
Only train() and eval().
Both bn and dropout will work in both cases but will have different behaviour as you expect them to have different behaviours during training and evaluation. For example, during evaluation, dropout should be disabled and so is replaced with a no op. Similarly, bn should use saved statistics instead of batch data and so that’s what it’s doing in eval mode.