What is the purpose of torch.no_grad():

I always see with torch.no_grad(): used in the tutorial in the validation phase. I understand that this is to tell the NN module that, no change should be made by temporarily turn off ‘gradient’ of each parameter. But I am just curious why this is necessary? Is it just a good habit to prevent the NN being changed after it is trained? I found that even without doing so, one can still predict the output by net(inputs).

If you don’t store gradients, computation is faster, and computation doesn’t save information required for backprop, which saves memory. If you evaluate, it makes a lot of sense to use no_grad, since you will be able to work with larger batch sizes and thus evaluate faster.