I want to build a model, that predicts next character based on the previous characters.

I have spliced text into sequences of integers with length = 100(using dataset and dataloader).

I want to predict character at each timestep.

Dimensions of my input and target variables are:

```
inputs dimension: (batch_size,sequence length). In my case (128,100)
targets dimension: (batch_size,sequence length). In my case (128,100)
```

After forward pass I get dimension of my predictions: (batch_size, sequence_length, vocabulary_size) which is in my case (128,100,44)

but when I calculate my loss using `nn.CrossEntropyLoss()`

function:

```
batch_size = 128
sequence_length = 100
number_of_classes = 44
# creates random tensor of your output shape
output = torch.rand(batch_size,sequence_length, number_of_classes)
# creates tensor with random targets
target = torch.randint(number_of_classes, (batch_size,sequence_length)).long()
# define loss function and calculate loss
criterion = nn.CrossEntropyLoss()
loss = criterion(output, target)
print(loss)
```

I get an error:

```
ValueError: Expected target size (128, 44), got torch.Size([128, 100])
```

Question is: how should I handle calculation of the loss function for many-to-many LSTM prediction? I want to predict a character at each timestep. According to nn.CrossEntropyLoss Dimension must be(N,C,d1,d2…dN), where N is batch_size,C - number of classes.