'model.eval()' vs 'with torch.no_grad()'

You might want to modify your response as it can easily confuse readers. Your comment says “batchnorm or dropout layers will work in eval model instead of training mode.” I think you wanted to write eval mode, not eval model.

Thanks I edited the answer above.

i understood that

  • eval() changes the bn and dropout layer’s behaviour

  • torch.no_grad() deals with the autograd engine and stops it from calculating the gradients, which is the recommended way of doing validation

BUT, I didnt understand the use of with torch.set_grad_enabled()

Can you pls explain what is its use and where exactly can it be used.
Thanks ! :slight_smile:

torch.set_grad_enabled lets you enable or disable the gradient calculations using a bool argument.
Have a look at the docs for example usage.

but torch.no_grad() does the same thing. is there any difference between these two?

torch.no_grad just disables the gradient calculation, while torch.set_grad_enabled sets gradient calculation to on or off based on the passed argument.

1 Like

are you saying that torch.no_grad and torch.set_grad_enabled(False) are the same ?


Yes, if you are using it as a context manager. torch.set_grad_enabled can “globally” enable/disable the gradient computation, if you call it as a function.


The method is called “inverted dropout”, whose purpose is to ensure the expectation of the dropout layer’s output remain unchanged.

Btw, if “inverted dropout” not applied (which mean you dont apply 1/(1-p)), the dropout layer’s output keep changing significantly (because it follows Bernoulli distribution and you never know how many nodes are dropped out this time), finally the output of whole network CANNOT keep stable which will disturb the procedure of backwardpropagating.

Another perspective is What is inverted dropout?

1 Like

Thanks for the awesome explanation, but I feel I’m missing one piece for the distinction. Why is it necessary to be able to backprop when doing model.eval()?



it’s not “necessary” to be able to backprop when doing .eval(). It’s just that .eval() has nothing to do with the autograd engine and the backprop capabilities.


Why is model forward pass slow while using torch.nograd()


I don’t see any mention to speed in this blogpost.
Can you detail your question a bit more please?

Hi @ptrblck, is it required to set gradient enabled with torch.set_grad_enabled(True) after torch.no_grad change back to model.train() from model.eval(), or the gradient will be automatically enabled with model.train(). I just want to confirm, it should be automatically enabled though.

model.train() and model.eval() do not change any behavior of the gradient calculations, but are used to set specific layers like dropout and batchnorm to evaluation mode (dropout won’t drop activations, batchnorm will use running estimates instead of batch statistics).

After the with torch.no_grad() block was executed, your gradient behavior will be the same as before entering the block.


Thanks for your explaination.
I am actually more interested in the usage of model.eval() and torch.no_grad()…

so means during evaluation, it’s enough to use:

for batch in val_loader:
    #some code

or I need to use them as:

with torch.no_grad():
    for batch in val_loader:
        #some code



The first approach is enough to get valid results.
The second approach will additionally save some memory.


Thanks! That helps alot. :+1::+1:

1 Like

If I’m not using loss.backwards() in my eval loop, do I still need to set torch.no_grad()? Will it make any difference?

1 Like

You don’t need to, but you can save memory and thus potentially increase the batch size, as no intermediate tensors will be stored. :wink: