Hi !
I am trying to train a LSTM-based network with variable-lengthed inputs using padded sequences.
In order for the training to be batched processed as well I flatted the padded output as well as the padded target before feeding it to NLLLoss()
, which is suppose to ignore field values of -100
as default behavior (according to the doc).
I did a test where I fed the padded input and padded corresponding target to NLLLoss
expecting to obtain 0
since these two tensor are containing the same informations but the value that I get is -0.0514
.
Is this a numerial error or a bug in my code ?
Here is my test :
crit = nn.NLLLoss()
input = inputTensor.view(-1, inputTensor.size(2))
target = target.view(-1)
crit(input, target)
Out[91]: tensor(-0.0514)
4 being my batch size, input and target are, before flattening :
input.size()
Out[87]: torch.Size([4, 1631, 99])
target.size()
Out[88]: torch.Size([4, 1631])
and after flattening :
input.size()
Out[89]: torch.Size([6524, 99])
target.size()
Out[90]: torch.Size([6524])