How do I use nn.CrossEntropyLoss() for seq2seq where my predication is of size (BS, seq_len, vocab_size) and truth of size (BS, seq_len)

How do I use nn.CrossEntropyLoss() for seq2seq where my predication is of size (BS, seq_len, vocab_size) and truth of size (BS, seq_len), for example

``````predication = torch.randn(2, 3, 5, requires_grad=True) # (BS, seq_len, vocab_size)
target = torch.empty(2, 3, dtype=torch.long).random_(5) # (BS, seq_len)
``````
``````predication: # size = (2, 3, 5)
tensor([[[-1.3824, -1.4598, -0.3210, -0.2991,  0.2965],
[ 0.2591, -0.5094, -0.7029,  0.2963, -1.8912],
[ 2.0020, -1.1158,  1.1687, -0.5815, -0.4416]],

[[ 2.9818,  0.4093,  1.9568,  0.0664, -0.3604],
[-0.6369, -0.3365, -1.3922, -0.6929, -0.1229],
[ 0.6589, -1.3124, -2.0313, -1.4866, -1.8163]]], requires_grad=True)
``````
``````target: # size = (2, 3)
tensor([[4, 3, 3],
[3, 2, 1]])
``````

i.e. [-1.3824, -1.4598, -0.3210, -0.2991, 0.2965] is the probabilities of words in my vocabulary for the first word in first batch first sequence, which predicts word with label(or index)=4 in the vocabulary and that is the same as the ground truth.

Try to permute the dimensions in your `predication` tensor to `[batch_size, nb_classes, seq_len]`, i.e. `[2, 5, 3]` and it should work.