How to prepare tensor correctly for loss computation?

I am new to the NN and PyTorch, so some things are not obvious to me. I would really appreciate help here. I wanted to use nn.CrossEntropyLoss on a tensor of shape [3, 14, 136], where 3 is a batch_size, 14 is sequence_length and 136 is number of tokens. It is one-hotted by number_of_tokens dimension.
I put this tensor into the nn.CrossEntropyLoss with the reference tensor of shape [3, 14] with correct tokens index in second dimension. However I’ve got an error:
ValueError: Expected target size (3, 136), got torch.Size([3, 14])
As I thought from the documentation, my dimensions were correct, what do I do wrong to receive this error?

Based on the docs the output of the model should have the shape [batch_size, nb_classes, *], while the target should have the shape [batch_size, *], so you would need to permute the output to the shape [batch_size=3, nb_classes=nb_tokens=136, seq_len=14] (assuming 136 equals the number of classes). :wink:

1 Like

Thank you for your help!
I did that before, but my loss becomes very big (4.9) and model generates rubbish of the same letter over and over again :slight_smile: could you advise me where to keep attention to beat my problem?

I would recommend to try to overfit a small data sample (e.g. just 10 samples) and make sure your current model is able to do so e.g. by playing around with some hyperparameters. Once this is done, you could try to scale up the use case again.

1 Like