How do I use nn.CrossEntropyLoss() for seq2seq where my predication is of size (BS, seq_len, vocab_size) and truth of size (BS, seq_len)

How do I use nn.CrossEntropyLoss() for seq2seq where my predication is of size (BS, seq_len, vocab_size) and truth of size (BS, seq_len), for example

predication = torch.randn(2, 3, 5, requires_grad=True) # (BS, seq_len, vocab_size)
target = torch.empty(2, 3, dtype=torch.long).random_(5) # (BS, seq_len)
predication: # size = (2, 3, 5)
tensor([[[-1.3824, -1.4598, -0.3210, -0.2991,  0.2965],
         [ 0.2591, -0.5094, -0.7029,  0.2963, -1.8912],
         [ 2.0020, -1.1158,  1.1687, -0.5815, -0.4416]],

        [[ 2.9818,  0.4093,  1.9568,  0.0664, -0.3604],
         [-0.6369, -0.3365, -1.3922, -0.6929, -0.1229],
         [ 0.6589, -1.3124, -2.0313, -1.4866, -1.8163]]], requires_grad=True)
target: # size = (2, 3)
tensor([[4, 3, 3],
        [3, 2, 1]])

i.e. [-1.3824, -1.4598, -0.3210, -0.2991, 0.2965] is the probabilities of words in my vocabulary for the first word in first batch first sequence, which predicts word with label(or index)=4 in the vocabulary and that is the same as the ground truth.

Try to permute the dimensions in your predication tensor to [batch_size, nb_classes, seq_len], i.e. [2, 5, 3] and it should work.