Cross Entropy Loss per Example

maralm · August 10, 2020, 4:07am

I have a cross entropy loss defined as below:

self.loss_fn = torch.nn.CrossEntropyLoss(ignore_index=-1, reduction = 'mean')
masked_lm_loss = self.loss_fn(prediction_scores.view(-1, self.vocab_size), masked_lm_labels.view(-1))

where prediction_scores is 64x128x30000 and masked_lm_labels is 64x128 (64 is the batch size).
I need to get per example loss too. So I tried to change the reduction to ‘none’ to get the per sample loss and used

torch.mean(masked_lm_loss_per_example.view(64, 128))

to get the total loss. However, the result is different from when I set the reduction to ‘mean’. Per exaqmple loss also doesn’t seem to be correct. Is there anything wrong in my calculation?

ptrblck · August 11, 2020, 8:19am

Could you post “small” tensors, which would reproduce these wrong results?
Here is a small example using different reduction settings with and without using ignore_index.