ValueError: Expected target size (128, 44), got torch.Size([128, 100]), LSTM Pytorch

I want to build a model, that predicts next character based on the previous characters.
I have spliced text into sequences of integers with length = 100(using dataset and dataloader).
I want to predict character at each timestep.

Dimensions of my input and target variables are:

inputs dimension: (batch_size,sequence length). In my case (128,100)
targets dimension: (batch_size,sequence length). In my case (128,100)

After forward pass I get dimension of my predictions: (batch_size, sequence_length, vocabulary_size) which is in my case (128,100,44)

but when I calculate my loss using nn.CrossEntropyLoss() function:

batch_size = 128
sequence_length   = 100
number_of_classes = 44
# creates random tensor of your output shape
output = torch.rand(batch_size,sequence_length, number_of_classes)
# creates tensor with random targets
target = torch.randint(number_of_classes, (batch_size,sequence_length)).long()

# define loss function and calculate loss
criterion = nn.CrossEntropyLoss()
loss = criterion(output, target)
print(loss)

I get an error:

ValueError: Expected target size (128, 44), got torch.Size([128, 100])

Question is: how should I handle calculation of the loss function for many-to-many LSTM prediction? I want to predict a character at each timestep. According to nn.CrossEntropyLoss Dimension must be(N,C,d1,d2…dN), where N is batch_size,C - number of classes.

As given in the docs, the class dimension should be in dim1, so this would work:

output = torch.rand(batch_size, number_of_classes, sequence_length)

In your training script, you could permute the output to match these dimensions.