How to draw loss per epoch

John1231983 · August 2, 2018, 9:44pm

I want to draw loss per epoch from the example https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#sphx-glr-beginner-blitz-cifar10-tutorial-py

The log file is

Out:

[1,  2000] loss: 2.173
[1,  4000] loss: 1.839
[1,  6000] loss: 1.659
[1,  8000] loss: 1.600
[1, 10000] loss: 1.533
[1, 12000] loss: 1.468
[2,  2000] loss: 1.395
[2,  4000] loss: 1.378
[2,  6000] loss: 1.368
[2,  8000] loss: 1.340
[2, 10000] loss: 1.316
[2, 12000] loss: 1.307
Finished Training

where the first column is the epoch number. So if I want to draw the loss per epoch, do I need to average the loss when they have same epoch number? It will be

Epoch   Loss
1          (2.173+1.839+1.659+1.600+1.533+1.468)/6
2         ...

Have you have more simple way in pytorch?

klory · August 3, 2018, 1:36am

for epoch in range(2):  # loop over the dataset multiple times
    epoch_loss = 0.0
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        epoch_loss += outputs.shape[0] * loss.item()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

    # print epoch loss
    print(epoch+1, epoch_loss / len(trainset))

print('Finished Training')

John1231983 · August 3, 2018, 2:04am

Thanks. So we have to modify the code. I think this may be good option

Dominik_Winterer · October 5, 2018, 9:12am

@klory
Why do you multiply the loss.item() with the first dimension of the outputs tensor.
This seems odd to me.

epoch_loss += outputs.shape[0] * loss.item()

klory · October 5, 2018, 1:09pm

The loss is averaged by the batch_size, which is the first dimension

Dominik_Winterer · October 5, 2018, 1:58pm

where is the average here? Shouldn’t you divide instead of multiplying?

klory · October 16, 2018, 3:06am

Inside the definition of criterion
https://pytorch.org/docs/stable/nn.html#torch.nn.CrossEntropyLoss

MauroCE · November 3, 2020, 3:35pm

I really couldn’t understand this for a long time. I think what Klory is trying to say is this:

If you look at most loss functions (e.g. Cross Entropy Loss) you will see that reduction="mean". This means that the loss is calculated for each item in the batch, summed and then divided by the size of the batch.
If you want to compute the standard loss (without the average) you will need to multiply the mean loss outputted by criterion() with the batch size, which is outputs.shape[0].