'model.eval()' vs 'with torch.no_grad()'

Have a look at this post for an example why we are scaling the activations.
Note that the p in my explanation refert to the keep probability not the drop probability.

4 Likes