Cross entropy shape of input and label

I am trying to implement a neural tager using roberta. It’s my first go at pretrained models and neural networks in general. This is my code.

class PosTagger(nn.Module):
    def __init__(self, output_size = 18, lr = 0.001):
        super().__init__()
        self.roberta =  XLMRobertaModel.from_pretrained('xlm-roberta-base')
        self.linear = nn.Linear(768, output_size)
        self.lr = lr
        


    def forward(self, input_ids, attention_mask):
        # Forward pass through the neural network
        seq_h = self.roberta(input_ids, attention_mask)
        
        last_hidden_state = seq_h.last_hidden_state

        print(last_hidden_state.shape) #it is (batch_size, sequence_length, hidden_size)
        
        
        logits = self.linear(last_hidden_state)
        print(logits.shape) #it is (batch_size, sequence_length, output_size)

       
        return logits

I am using Cross entropy and getting this runtime error :Expected target size [32, 18], got [32, 56].

Transposing the logits seems to solve the problem
logits = logits.transpose(1,2) #now is (batch_size, hidden_size, sequence_length)

According to documentation for crossentropy needs for input the shape [batchsize, number of classes] and for label [batchsize]. In contrast my input after transposing it has shape [batchsize, num_of_classes, lenght_sequence] and labels of the shape [batchsize, lensequence].

I don’t understand if it works properly and if yes why.

You can check the docs. CrossEntropyLoss does support multidimensional cases. I found some code for a RNN-based NER tagger containing the line

loss = criterion(outputs.permute(0,2,1), targets)

So this should in general work.

I wasn’t sure if the output will still the same as having less in dimensions.

Thank you!