'model.eval()' vs 'with torch.no_grad()'

ptrblck · January 17, 2019, 9:14pm

Have a look at this post for an example why we are scaling the activations.
Note that the p in my explanation refert to the keep probability not the drop probability.