I’ve a RNN model that take as input 64 (batch size) x 100 (time steps) * 3 (3 labels to be predicted, 2 of them have 64 classes, and the 3rd has 2 classes).
The model output is the same
I tried the
CrossEntropyLoss loss function, it gave an error that it needs only 2-D or 4-D tensors. What is wrong in what I am doing?
What is the good loss function that i should use for my problem?
the cross entropy loss doesn’t know about timesteps or multiple classes. Last time I needed that for a single class, I used
loss = lossfn(scores.view(-1,batch_size*time_steps), labels.contiguous().view(-1))
(the contiguous was needed because the view failed without due to the minibatch preparation method, you could try to do without).
If you have three labels, you might just hand back three score vectors and add three cross entropy losses.
But this is only one way to do it, and you might look at what best fits your purpose. For example Sean Robertson
just adds the losses over the sequence steps in his RNN-for-Shakespeare tutorial (the notebook is an excellent read, too, but it is harder to link to specific lines), probably because the outputs are generated one by one anyways.
Thank you @tom for your reply.
I really don’t know, I tried everything with it.
It only worked when I used
Log Softmax instead of
Softmax, and used
NLLLoss instead of
If you use CrossEntropyLoss, you dont need to put
LogSoftmax at the end.