How does NLLLoss handle predicted minibatch with target minibatch


Currently I’m working on a char-rnn network with lstm. Everything goes well until I try to compute the nllloss - this is where I’m a bit lost and confused.

As an example, I have a batch of 2 with a sequence length of 5 and a embedding dimension of say 10 target characters:

so my input shape before passing through the one-hot embedding layer is [5,2], i.e.

tensor([[ 10,  62],
        [ 49,  61],
        [ 34,  69],
        [ 42,  51],
        [  2,   2]])

after passing the mini-batch input into the embedding layer, my shape is now [5,2,10] and continues to be so as I passed it onto the lstm and softmax layers.

when I go to perform the loss function, it errors and says I have a miss match in size:
loss = F.nll_loss(pred,input)

obviously, the sizes now are F.nll_loss([5,2,10], [5,2])

I read that nllloss does not want one-hot encoding for the target space and only the indexs of the category. So this is the part where I don’t know how to structure the prediction and target for the NLLLoss to be calculated correctly.

Thanks for the help and guidance, cheers!


F.nll_loss expects input and target to be 2-dimensional and 1-dimensional, respectively or (N, C, d_1, d_2, …, d_K) and (N, d_1, d_2, …, d_K) respectively source code.
So, in your case, reshaping input tensor to (5 * 2, 10) and target tensor to (5 * 2) will make it work.

1 Like

Thank you! Reshaping it worked :slight_smile: