I am trying to train a LSTM-based network with variable-lengthed inputs using padded sequences.
In order for the training to be batched processed as well I flatted the padded output as well as the padded target before feeding it to
NLLLoss(), which is suppose to ignore field values of
-100 as default behavior (according to the doc).
I did a test where I fed the padded input and padded corresponding target to
NLLLoss expecting to obtain
0 since these two tensor are containing the same informations but the value that I get is
Is this a numerial error or a bug in my code ?
Here is my test :
crit = nn.NLLLoss() input = inputTensor.view(-1, inputTensor.size(2)) target = target.view(-1) crit(input, target) Out: tensor(-0.0514)
4 being my batch size, input and target are, before flattening :
input.size() Out: torch.Size([4, 1631, 99]) target.size() Out: torch.Size([4, 1631])
and after flattening :
input.size() Out: torch.Size([6524, 99]) target.size() Out: torch.Size()