nn.CrossEntropyLoss and batching

I’m trying to use cross entropy loss on sequential batched data.
I have a many to many language model.
My nets output size is torch.Size([20, 54, 9999]), and the labels are torch.Size([20, 54]).
Batch size is 20, Sentence Length is 54 and vocab 9999.
When trying to pass this to cross entropy it passes the net output as (20, 9999) and therefore expects the target also to be of this size.
How can I use cross entropy loss in this case?


I think one option will be to add a linear final layer (nn.Linear(9999, 54)) to get prediction ([20,54]) similar to output size.

The 54 is just given as an example, I have varying sequence lengths.
Using transpose fixes that, however I think it’s wrong. and now I’m getting this error:
RuntimeError: CUDA error: device-side assert triggered
Any suggestions?

Logically, I tried to max out the vocab dimension to get the actual prediction, but it seems that cross entropy loss should ‘take care’ of that (+ it gave another error about not supporting long).