NLLLoss not correctly ignoring padding token?

Hi !

I am trying to train a LSTM-based network with variable-lengthed inputs using padded sequences.
In order for the training to be batched processed as well I flatted the padded output as well as the padded target before feeding it to NLLLoss(), which is suppose to ignore field values of -100 as default behavior (according to the doc).

I did a test where I fed the padded input and padded corresponding target to NLLLoss expecting to obtain 0 since these two tensor are containing the same informations but the value that I get is -0.0514.
Is this a numerial error or a bug in my code ?

Here is my test :

crit = nn.NLLLoss()
input = inputTensor.view(-1, inputTensor.size(2))
target = target.view(-1)

crit(input, target)
Out[91]: tensor(-0.0514)

4 being my batch size, input and target are, before flattening :

Out[87]: torch.Size([4, 1631, 99])
Out[88]: torch.Size([4, 1631])

and after flattening :

Out[89]: torch.Size([6524, 99])
Out[90]: torch.Size([6524])