I have been looking at the pytorch example for training a classifier here: https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html
The training code is as follows:
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0
print('Finished Training')
This might be a dumb question but nowhere do we actually check if the training loss has decreased or not (except to print the statistics). Is this happening in the background so that the model is updated if the training loss is actually doing down?
If yes, when are the weights updated? Do the weights get updated if the training loss decreases for each minibatch or do they get updated if the average error across all the mini batches go down (i.e. for the whole epoch?)
EDIT
So, it seems that the optimizer.step()
function is the one that will update the parameters. But looking at how the optimizer is used, I see the declaration:
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
Then at some point during the training we have:
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
The optimizer
itself does not seem to have any access to the loss function. So how can it know whether a GD step is actually making things better or worst!