I’m getting crazy in using a nn.CrossEntropyLoss with a batch of sequences in a setting similar to seq2seq.

I have a tensor input whose shape is [50,24,10000]: 50 is the batch size, 24 is the sequence lengths and 10000 is my vocabulary size, i.e., here I have the logits from a network (no softmax applied).

I have also my target tensor whose shape is [50,24], i.e., batch size x sequence lengths containing the true id of each element of the sequences.

If using the CrossEntropyLoss I obtain the error: Expected target size (50, 10000), got torch.Size([50, 24]).

I’m afraid I’m missing something from the docs. Is the batching the problem? I have to perform the argmax on the input tensor?