NLLoss for framewise classification using LSTM not working with a batch dimension

I’m trying to do a frame by frame classification of some skeleton data using the torch.nn.NLLLoss().

The input is of size [batch_size, frames, features] and the network outputs logsoftmax values for each of the 12 classes for each of the frames so [batch_size, frames, classes=12]

I then want to compute the Negative Logliklihood Loss for this output based on the ground truth classes for each frame. At the moment the labels tensor is of size [batch_size, frames] where the second dimension contains the integer valued classes.

When I try to compute the loss I get the following error

ValueError: Expected target size (50, 12), got torch.Size([50, 175])

For a single example from a batch the loss seems to work but when I introduce the batch dimension it throws the above.

Any ideas on how to fix this would be greatly appreciated.


If you want to compute the loss for all frames, the output should have the shape [batch_size=50, nb_classes=12, nb_frames=175] while the target shape is correct in this case.
I’m not sure, how your model processes the input and if you are expecting to get all frame outputs in a single tensor. If not, you would have to calculate the loss per frame by indexing the target in dim1.

Thanks @ptrblck. Just to be sure I’m not making any errors. The models is as follows:

class SeqLSTM(nn.Module):
    def __init__(self, input_size, num_classes, num_layers, dropout):
        super(SeqLSTM, self).__init__()

        self.lstm_one = nn.LSTM(input_size=input_size, hidden_size=int(input_size * 1.5),
                                num_layers=num_layers, dropout=dropout, batch_first=True)
        self.linear = nn.Linear(in_features=int(input_size * 1.5), out_features=num_classes)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, input):
        x, (hidden, cell) = self.lstm_one(input)
        x = self.linear(x)
        x = self.softmax(x)
        return x

The aim is to have each output tensor represent the softmax probabilities that each frame is of a particular class. I’m guessing that I’ll just have to permute the output to get the [batch_size=50, nb_classes=12, nb_frames=175] order that you suggested?

Thanks again

Yes, you could permute it and you should check the output shape of the self.lstm_one.
Based on the used config I assume the output would have the shape [batch_size, seq_len, nb_features]. If that’s the case, you could feed it into the linear layer and would then get an output in [batch_size, seq_len, out_features=num_classes]. Permute this output of the linear layer, and pass it to the log softmax (if you are using nn.NLLLoss) or remove the log softmax and pass it directly to nn.CrossEntropyLoss.