-
model.train()
andmodel.eval()
change the behavior of some layers. E.g.nn.Dropout
won’t drop anymore andnn.BatchNorm
layers will use the running estimates instead of the batch statistics. Thetorch.set_grad_enabled
line of code makes sure to clear the intermediate values for evaluation, which are needed to backpropagate during training, thus saving memory. It’s comparable to thewith torch.no_grad()
statement but takes a bool value. -
All new operations in the
torch.set_grad_enabled(False)
block won’t require gradients. However, the model parameters will still require gradients. -
The
running_loss
will be “de-averaged” by multiplying it withinputs.size(0)
. Therefore you should divide by the whole dataset length, not the number of batches.
9 Likes