Don't Understand Loss Calculation


In the evaluate method here the loss is calculated like so:

total_loss += len(data) * criterion(output_flat, targets).item()

Why is that? Why not just criterion(output_flat, targets).item()? Especially that during training, the loss is calculated

loss = criterion(output.view(-1, ntokens), targets)

One might say that I’m in a bit of a loss :slight_smile:

Thanks for the help!

This is sometimes done during evaluation or testing to get the exact loss in case the last batch is smaller than the rest. If you just add the test losses together, remember that in the default case each one will be an average over the current batch. If all batches have the same length, you can just divide this accumulated loss by the number of batches (len(test_loader)) and you should get the average loss for this epoch.

However, if the dataset length is not divisible by the batch size without a remainder, the last batch will be smaller than the rest. Summing all the batch averages together and just dividing by the number of batches will give you a slightly wrong answer. To correct this you can multiply each batch loss with its length (alternatively use the summation option for your criterion) and divide this accumulated loss by the number of samples (len(test_dataset)) to get the exact test loss average.


It’s driving me nuts, it seems I’m doing everything okay, including the loss calculation, but still the model and training specified here aren’t working…

Hi, I would like to extend your answer. Suppose,
running_loss += loss.item()
epoch_loss = running_loss / len(dataset).
I think this is also a way to calculate the loss. We don’t have problem with the batches. Am i right?

If your loss is being reduced (e.g. averaged over the batch dimension), epoch_loss won’t give you the true value.