'model.eval()' vs 'with torch.no_grad()'

sfarhand · May 9, 2019, 10:49pm

You might want to modify your response as it can easily confuse readers. Your comment says “batchnorm or dropout layers will work in eval model instead of training mode.” I think you wanted to write eval mode, not eval model.

albanD · May 13, 2019, 4:46am

Thanks I edited the answer above.

n0obcoder · June 4, 2019, 6:32am

i understood that

eval() changes the bn and dropout layer’s behaviour
torch.no_grad() deals with the autograd engine and stops it from calculating the gradients, which is the recommended way of doing validation

BUT, I didnt understand the use of with torch.set_grad_enabled()

Can you pls explain what is its use and where exactly can it be used.
Thanks !

ptrblck · June 4, 2019, 9:58am

torch.set_grad_enabled lets you enable or disable the gradient calculations using a bool argument.
Have a look at the docs for example usage.

n0obcoder · June 4, 2019, 10:04am

but torch.no_grad() does the same thing. is there any difference between these two?

ptrblck · June 4, 2019, 10:06am

torch.no_grad just disables the gradient calculation, while torch.set_grad_enabled sets gradient calculation to on or off based on the passed argument.

n0obcoder · June 4, 2019, 10:10am

are you saying that torch.no_grad and torch.set_grad_enabled(False) are the same ?

ptrblck · June 4, 2019, 2:48pm

Yes, if you are using it as a context manager. torch.set_grad_enabled can “globally” enable/disable the gradient computation, if you call it as a function.

wuuw · July 9, 2019, 7:04am

The method is called “inverted dropout”, whose purpose is to ensure the expectation of the dropout layer’s output remain unchanged.

Btw, if “inverted dropout” not applied (which mean you dont apply 1/(1-p)), the dropout layer’s output keep changing significantly (because it follows Bernoulli distribution and you never know how many nodes are dropped out this time), finally the output of whole network CANNOT keep stable which will disturb the procedure of backwardpropagating.

Another perspective is What is inverted dropout?

MadeUpMasters · August 30, 2019, 2:47pm

Thanks for the awesome explanation, but I feel I’m missing one piece for the distinction. Why is it necessary to be able to backprop when doing model.eval()?

albanD · August 30, 2019, 8:18pm

Hi,

it’s not “necessary” to be able to backprop when doing .eval(). It’s just that .eval() has nothing to do with the autograd engine and the backprop capabilities.

krishansubudhi · September 22, 2019, 3:28pm

Why is model forward pass slow while using torch.nograd()

albanD · September 22, 2019, 4:24pm

Hi,

I don’t see any mention to speed in this blogpost.
Can you detail your question a bit more please?

bruceyo · October 15, 2019, 3:15am

Hi @ptrblck, is it required to set gradient enabled with torch.set_grad_enabled(True) after torch.no_grad change back to model.train() from model.eval(), or the gradient will be automatically enabled with model.train(). I just want to confirm, it should be automatically enabled though.

ptrblck · October 15, 2019, 9:32am

model.train() and model.eval() do not change any behavior of the gradient calculations, but are used to set specific layers like dropout and batchnorm to evaluation mode (dropout won’t drop activations, batchnorm will use running estimates instead of batch statistics).

After the with torch.no_grad() block was executed, your gradient behavior will be the same as before entering the block.

BarCodeReader · December 13, 2019, 3:32am

Thanks for your explaination.
I am actually more interested in the usage of model.eval() and torch.no_grad()…

so means during evaluation, it’s enough to use:

model.eval()
for batch in val_loader:
    #some code

or I need to use them as:

model.eval()
with torch.no_grad():
    for batch in val_loader:
        #some code

Thanks

ptrblck · December 13, 2019, 3:39am

The first approach is enough to get valid results.
The second approach will additionally save some memory.

BarCodeReader · December 13, 2019, 3:41am

Thanks! That helps alot.

pvardanis · February 28, 2020, 1:39pm

If I’m not using loss.backwards() in my eval loop, do I still need to set torch.no_grad()? Will it make any difference?

ptrblck · February 28, 2020, 6:44pm

You don’t need to, but you can save memory and thus potentially increase the batch size, as no intermediate tensors will be stored.