Cross-entropy loss for sequence of elements

I have a sequece labeling task.

So as input, I have a sequence of elements with shape [batch_size, sequence_length] and I need to assign a class for each element of a sequence.

And as a loss function, I use a Cross-entropy.

How should I correctly use it?
My variable target_predictions has shape [batch_size, sequence_length, number_of_classes] and target has shape [batch_size, sequence_length]
Documentation says:

I know if I use CrossEntropyLoss(target_predictions.permute(0, 2, 1), target), everything will work fine. But I have concerns that in case of k-dimensional loss, torch will intepret my sequence_length as d_1 as on screenshot.

How should I correctly do it?

and why is this wrong?