Compute CrossEntropyLoss per sentence in MLM task

ThomasGk · January 7, 2022, 1:11pm

Hello everyone. I am using a HuggingFace model, to which I pass a couple of sentences. Then I am getting the logits, and using PyTorch’s CrossEntropyLoss, to get the loss. The problem is as follows:

I want the loss of each sentence. If I have 3 sentences, each with 10 tokens, the logits have size [3, 10, V], where V is my vocab size. The labels have size [3, 10], basically the correct labels for each of the 10 tokens in each sentence.

How can I get the cross entropy of each sentence then? If I do reduction='mean', I am going to get the overall mean loss (1 number). If I use reduction='none', then I get one number for each token, so basically the loss of each single token. The code I am using is

loss_fct = nn.CrossEntropyLoss(reduction=‘mean’) # or ‘none’
masked_lm_loss = loss_fct(outputs.logits.cpu().detach().view(-1, V), target_ids.view(-1))

Perhaps I need to define the views differently in the second line?
Thanks in advance.

mMagmer · January 9, 2022, 3:02pm

hi

you can use or form,

loss_fct = nn.CrossEntropyLoss(reduction=‘none’)
masked_lm_loss = loss_fct(torch.transpose(outputs.logits.cpu().detach(), 1, 2), target_ids)

and then mean over last dim

masked_lm_loss.mean(-1)

you should have 3 positive losses, one for each sentence.

ThomasGk · January 10, 2022, 10:20am

Thanks. Can you please explain the idea behind this? Why does swapping the 1st and 2nd dimension get us what we want? We end up with the logits having shape [3, V, 10].

mMagmer · January 10, 2022, 11:28am

you have to optional shape for CrossEntropyLoss input and target.
commonly use form is to have NxC logit and N “int” vector as target.
other optoin is to have NxCx··· as logit (sum over second dim is equal to 1) and have tensor Nx··· as target.

ThomasGk · January 11, 2022, 10:25pm

Alright, I see, thank you!