I have a cross entropy loss defined as below:
self.loss_fn = torch.nn.CrossEntropyLoss(ignore_index=-1, reduction = 'mean')
masked_lm_loss = self.loss_fn(prediction_scores.view(-1, self.vocab_size), masked_lm_labels.view(-1))
where prediction_scores is 64x128x30000 and masked_lm_labels is 64x128 (64 is the batch size).
I need to get per example loss too. So I tried to change the reduction to ‘none’ to get the per sample loss and used
torch.mean(masked_lm_loss_per_example.view(64, 128))
to get the total loss. However, the result is different from when I set the reduction to ‘mean’. Per exaqmple loss also doesn’t seem to be correct. Is there anything wrong in my calculation?