Cross Entropy Loss per Example

I have a cross entropy loss defined as below:

self.loss_fn = torch.nn.CrossEntropyLoss(ignore_index=-1, reduction = 'mean')
masked_lm_loss = self.loss_fn(prediction_scores.view(-1, self.vocab_size), masked_lm_labels.view(-1))

where prediction_scores is 64x128x30000 and masked_lm_labels is 64x128 (64 is the batch size).
I need to get per example loss too. So I tried to change the reduction to ‘none’ to get the per sample loss and used

torch.mean(masked_lm_loss_per_example.view(64, 128))

to get the total loss. However, the result is different from when I set the reduction to ‘mean’. Per exaqmple loss also doesn’t seem to be correct. Is there anything wrong in my calculation?

Could you post “small” tensors, which would reproduce these wrong results?
Here is a small example using different reduction settings with and without using ignore_index.