-
model.train()andmodel.eval()change the behavior of some layers. E.g.nn.Dropoutwon’t drop anymore andnn.BatchNormlayers will use the running estimates instead of the batch statistics. Thetorch.set_grad_enabledline of code makes sure to clear the intermediate values for evaluation, which are needed to backpropagate during training, thus saving memory. It’s comparable to thewith torch.no_grad()statement but takes a bool value. -
All new operations in the
torch.set_grad_enabled(False)block won’t require gradients. However, the model parameters will still require gradients. -
The
running_losswill be “de-averaged” by multiplying it withinputs.size(0). Therefore you should divide by the whole dataset length, not the number of batches.
9 Likes