'model.eval()' vs 'with torch.no_grad()'

Hello, do you know how exactly the eval mode affect the dropout layer in the test? What are the differences of the dropout behavior between the eval and training mode?

During eval Dropout is deactivated and just passes its input.
During the training the probability p is used to drop activations. Also, the activations are scaled with 1./p as otherwise the expected values would differ between training and eval.

drop = nn.Dropout()
x = torch.ones(1, 10)

# Train mode (default after construction)

# Eval mode

Thanks a lot. Your answer is the same as what I thought.

Could someone please confirm whether this means that you handle evaluating and testing similarly? In both cases you set the model to .eval() and use with torch.no_grad()? (A bit more explanation as to why we treat them similarly is also welcome; I am a beginner.)


In Dropout documentation, it says the probability p is used to drop activations. At the same time, the activations not be dropped are scaled with 1/(1-p), I am not sure why it uses 1/(1-p) as a factor to scale the activations, could you give some explanation?

Have a look at this post for an example why we are scaling the activations.
Note that the p in my explanation refert to the keep probability not the drop probability.


Thanks for your explanation, now I am clear about that

hi, are you sure bn and dropout work in eval model? I think bn and dropout work in trainning mode, not working in validation and test mode.


There is no such thing as “test mode”.
Only train() and eval().
Both bn and dropout will work in both cases but will have different behaviour as you expect them to have different behaviours during training and evaluation. For example, during evaluation, dropout should be disabled and so is replaced with a no op. Similarly, bn should use saved statistics instead of batch data and so that’s what it’s doing in eval mode.


You might want to modify your response as it can easily confuse readers. Your comment says “batchnorm or dropout layers will work in eval model instead of training mode.” I think you wanted to write eval mode, not eval model.

Thanks I edited the answer above.

i understood that

  • eval() changes the bn and dropout layer’s behaviour

  • torch.no_grad() deals with the autograd engine and stops it from calculating the gradients, which is the recommended way of doing validation

BUT, I didnt understand the use of with torch.set_grad_enabled()

Can you pls explain what is its use and where exactly can it be used.
Thanks ! :slight_smile:

torch.set_grad_enabled lets you enable or disable the gradient calculations using a bool argument.
Have a look at the docs for example usage.

but torch.no_grad() does the same thing. is there any difference between these two?

torch.no_grad just disables the gradient calculation, while torch.set_grad_enabled sets gradient calculation to on or off based on the passed argument.

1 Like

are you saying that torch.no_grad and torch.set_grad_enabled(False) are the same ?


Yes, if you are using it as a context manager. torch.set_grad_enabled can “globally” enable/disable the gradient computation, if you call it as a function.


The method is called “inverted dropout”, whose purpose is to ensure the expectation of the dropout layer’s output remain unchanged.

Btw, if “inverted dropout” not applied (which mean you dont apply 1/(1-p)), the dropout layer’s output keep changing significantly (because it follows Bernoulli distribution and you never know how many nodes are dropped out this time), finally the output of whole network CANNOT keep stable which will disturb the procedure of backwardpropagating.

Another perspective is What is inverted dropout?

1 Like

Thanks for the awesome explanation, but I feel I’m missing one piece for the distinction. Why is it necessary to be able to backprop when doing model.eval()?



it’s not “necessary” to be able to backprop when doing .eval(). It’s just that .eval() has nothing to do with the autograd engine and the backprop capabilities.