Compute CrossEntropyLoss per sentence in MLM task

Hello everyone. I am using a HuggingFace model, to which I pass a couple of sentences. Then I am getting the logits, and using PyTorch’s CrossEntropyLoss, to get the loss. The problem is as follows:

I want the loss of each sentence. If I have 3 sentences, each with 10 tokens, the logits have size [3, 10, V], where V is my vocab size. The labels have size [3, 10], basically the correct labels for each of the 10 tokens in each sentence.

How can I get the cross entropy of each sentence then? If I do reduction='mean', I am going to get the overall mean loss (1 number). If I use reduction='none', then I get one number for each token, so basically the loss of each single token. The code I am using is

loss_fct = nn.CrossEntropyLoss(reduction=‘mean’) # or ‘none’
masked_lm_loss = loss_fct(outputs.logits.cpu().detach().view(-1, V), target_ids.view(-1))

Perhaps I need to define the views differently in the second line?
Thanks in advance.


you can use or form,

loss_fct = nn.CrossEntropyLoss(reduction=‘none’)
masked_lm_loss = loss_fct(torch.transpose(outputs.logits.cpu().detach(), 1, 2), target_ids)

and then mean over last dim


you should have 3 positive losses, one for each sentence.

Thanks. Can you please explain the idea behind this? Why does swapping the 1st and 2nd dimension get us what we want? We end up with the logits having shape [3, V, 10].

you have to optional shape for CrossEntropyLoss input and target.
commonly use form is to have NxC logit and N “int” vector as target.
other optoin is to have NxCx··· as logit (sum over second dim is equal to 1) and have tensor Nx··· as target.

Alright, I see, thank you!