Validation and training loss per batch and epoch

over9k · January 9, 2021, 12:15am

Hi,

I am currently keeping track of training and validation loss per epoch, which is pretty standard. However, what is the best way of going about keeping track of training and validation loss per batch/iteration?

For training loss, I could just keep a list of the loss after each training loop. But, validation loss is calculated after a whole epoch, so I’m not sure how to go about the validation loss per batch. The only thing I can think of is to run the whole validation step after each training batch and keeping track of those, but that seems overkill and a lot of computation.

For example, the training is like this:

for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()

And for validation loss:

with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
        # validation loss
        batch_loss = error(outputs.float(), labels.long()).item()
        loss_test += batch_loss
    loss_test /= len(testloader)

The validation loss/test part is done per epoch. I’m looking for a way to get the validation loss per batch, which is my point above.

Any tips?

ptrblck · January 18, 2021, 7:17am

That’s right and is the reason, why the validation loss and metric is usually calculated once per epoch.
The idea of calculating the validation loss is to get a signal of the model’s performance on “unseen” data. In the best case the validation loss should closely stick to the final test loss (note that you shouldn’t touch the test dataset until your training is done, as you would otherwise leak the test data information into the training). For this, it’s often not needed to closely track the validation loss after each batch/iteration.