I am confused about the following snippet taken from the tutorials about transfer learning.
for phase in ['train', 'val']:
if phase == 'train':
scheduler.step()
model.train() # Set model to training mode
else:
model.eval() # Set model to evaluate mode
running_loss = 0.0
running_corrects = 0
# Iterate over data.
for inputs, labels in dataloaders[phase]:
inputs = inputs.to(device)
labels = labels.to(device)
# zero the parameter gradients
optimizer.zero_grad()
# forward
# track history if only in train
with torch.set_grad_enabled(phase == 'train'):
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
# backward + optimize only if in training phase
if phase == 'train':
loss.backward()
optimizer.step()
# statistics
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)
epoch_loss = running_loss / dataset_sizes[phase]
epoch_acc = running_corrects.double() / dataset_sizes[phase]
- What does the part
with torch.set_grad_enabled(phase == 'train'):
do? I thought that model.eval() and model.train() is enough to put the model into states in which to save and evaluate backprob information.
-
How does it compare to requires_grad() ? do the fulfill the same purpose?
-
Why is the epoch_loss divided by the size of the training set? According to the docs the loss is by default averaged over the batch size (reduction=âmeanâ), so why would I divide by the entire data set size in the end?