Currently I’m working on a char-rnn network with lstm. Everything goes well until I try to compute the nllloss - this is where I’m a bit lost and confused.
As an example, I have a batch of 2 with a sequence length of 5 and a embedding dimension of say 10 target characters:
so my input shape before passing through the one-hot embedding layer is [5,2], i.e.
tensor([[ 10, 62], [ 49, 61], [ 34, 69], [ 42, 51], [ 2, 2]])
after passing the mini-batch input into the embedding layer, my shape is now [5,2,10] and continues to be so as I passed it onto the lstm and softmax layers.
when I go to perform the loss function, it errors and says I have a miss match in size:
loss = F.nll_loss(pred,input)
obviously, the sizes now are F.nll_loss([5,2,10], [5,2])
I read that nllloss does not want one-hot encoding for the target space and only the indexs of the category. So this is the part where I don’t know how to structure the prediction and target for the NLLLoss to be calculated correctly.
Thanks for the help and guidance, cheers!