CrossEntropy loss for RNN output

Hi everyone,

I’ve a RNN model that take as input 64 (batch size) x 100 (time steps) * 3 (3 labels to be predicted, 2 of them have 64 classes, and the 3rd has 2 classes).
The model output is the same

I tried the CrossEntropyLoss loss function, it gave an error that it needs only 2-D or 4-D tensors. What is wrong in what I am doing?

What is the good loss function that i should use for my problem?

Hi @osm3000,

the cross entropy loss doesn’t know about timesteps or multiple classes. Last time I needed that for a single class, I used

loss = lossfn(scores.view(-1,batch_size*time_steps), labels.contiguous().view(-1))

(the contiguous was needed because the view failed without due to the minibatch preparation method, you could try to do without).
If you have three labels, you might just hand back three score vectors and add three cross entropy losses.

But this is only one way to do it, and you might look at what best fits your purpose. For example Sean Robertson
just adds the losses over the sequence steps in his RNN-for-Shakespeare tutorial (the notebook is an excellent read, too, but it is harder to link to specific lines), probably because the outputs are generated one by one anyways.

Best regards

Thomas

3 Likes

Thank you @tom for your reply.

I really don’t know, I tried everything with it.
It only worked when I used Log Softmax instead of Softmax, and used NLLLoss instead of CrossEntropy

If you use CrossEntropyLoss, you dont need to put Softmax or LogSoftmax at the end.

2 Likes