Why does the cifar example divide by 2000?

code:

    for i, data in enumerate(trainloader, 0):
        # get the inputs
        inputs, labels = data
        inputs, labels = Variable(inputs), Variable(labels)
        # zero the parameter gradients
        optimizer.zero_grad()
        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        # print statistics
        running_loss += loss.data[0]
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print(f'monitoring during training: eptoch={epoch+1}, batch_index={i+1}, loss={running_loss/2000}')
            running_loss = 0.0

but I don’t understand why its dividing by 2000.

What does loss.data output?

loss.data outputs a single value, as you’d expect, but it’s running_loss that’s being divided by 2000. By the time you reach that print statement, running_loss has added together 2000 loss values, so dividing by 2000 gives the average. Then running_loss is reset to zero and the process starts again.

1 Like

this is actually a little bit wrong. Its probably a upper bound on the true training error at epoch X because its updating the model during the iterations until it reaches a full epoch (i.e. the model is changing). But I guess its faster though that updating the whole model and then checking the whole train error.

there is another subtle point, even if SGD wasn’t changing the model which screws up this estimate as being exactly the train set, if the batch sizes are not all the same size then its not the right estimate because 1/n assumes all the batches are of the same size.

If the default value of size_average was set to False for the criterion (whatever the criterion might be).
Otherwise it’s the average loss of the last 2000 iterations as @swibe said.

Whats the point of tracking this? I would have thought that tracking the true train error at the end of each epoch was the goal, thus I assumed this was trying to approximate this.

I don’t know in which context this code was produced and if 2000 iterations are one epoch.

How would you define the “true” train error at the end of each epoch?
Would you just use the last batch loss, the average of all batches or in the worst case iterate the training set in eval mode and print the loss?

I just guess the code runs quite fast, the user didn’t want his terminal to be flooded with a lot of information (printing might also be quite expensive if you do it at high frequencies), and thus only prints the average of the last 2000 batches.

The true train error is the train error of the model at epoch T. I assumed this was approximating this by take the average loss of the last 2000 iterations.