I am trying to implement a neural tager using roberta. It’s my first go at pretrained models and neural networks in general. This is my code.
class PosTagger(nn.Module):
def __init__(self, output_size = 18, lr = 0.001):
super().__init__()
self.roberta = XLMRobertaModel.from_pretrained('xlm-roberta-base')
self.linear = nn.Linear(768, output_size)
self.lr = lr
def forward(self, input_ids, attention_mask):
# Forward pass through the neural network
seq_h = self.roberta(input_ids, attention_mask)
last_hidden_state = seq_h.last_hidden_state
print(last_hidden_state.shape) #it is (batch_size, sequence_length, hidden_size)
logits = self.linear(last_hidden_state)
print(logits.shape) #it is (batch_size, sequence_length, output_size)
return logits
I am using Cross entropy and getting this runtime error :Expected target size [32, 18], got [32, 56].
Transposing the logits seems to solve the problem
logits = logits.transpose(1,2) #now is (batch_size, hidden_size, sequence_length)
According to documentation for crossentropy needs for input the shape [batchsize, number of classes]
and for label [batchsize]
. In contrast my input after transposing it has shape [batchsize, num_of_classes, lenght_sequence]
and labels of the shape [batchsize, lensequence]
.
I don’t understand if it works properly and if yes why.