LSTM + FC training stagnation

Hello,

I’m trying to train an LSTM network with a fully connected layer on top of it. But I am facing some issues because I’m not so sure if my model is correctly written, or my training procedure is wrong.

The task is a binary classification with some sequential data of variable length, the batch is a tensor of size torch.Size([32, 58735, 49]), for example, where 32 is the batch size, 58735 is the length of the biggest sequence, and 49 is the number of features of time t of the sequence. I also pass a list of lengths to the model with size torch.Size([32]), equal to the number of examples (i.e., batch size) to use the pack_padded_sequence function, because of the variable length of the sequences. My model is written as:

class LSTM(nn.Module):

    def __init__(self, input_size, hidden_size):
        super(LSTM, self).__init__()

        self.input_size = input_size
        self.hidden_size = hidden_size
        self.lstm = nn.LSTM(self.input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(self.hidden_size, 1)

    def forward(self, x, lengths):
        bs = x.size(0)
        self.ho = torch.zeros((1, bs, self.hidden_size), device=x.device)
        self.co = torch.zeros((1, bs, self.hidden_size), device=x.device)
        input_lengths, perm_idx = lengths.sort(0, descending=True)
        x = x[perm_idx][:, :input_lengths.max()]
        x = pack_padded_sequence(x, input_lengths, batch_first=True)
        lstm_out, (self.ho, self.co) = self.lstm(x, (self.ho, self.co))
        lstm_out, lengths = pad_packed_sequence(lstm_out)
        x = self.fc(self.ho.squeeze())
        return x.squeeze()

And the training procedure can be seen here with this gist

The problem is that when training with exactly this procedure and this LSTM model, the training accuracy and val accuracy stagnates since the beginning to the end with 0.7730 and 0.2699 accuracies respectively. Since this is the first time I make use of nn.LSTM and pack_padded_sequence, I’m not so sure where the problem could lie in.

check the outputs of the LSTM… See what it is predicting to see why the accuracies don’t change.

I’m facing something similar here MQRNN (Seq2Seq LSTM + MLP) model gives constant predictions

Ok, I analyzed the output predictions from epoch 0, and it starts predicting class 1 for some examples… But 1 or 2 epochs later it stops predicting class 1 and starts predicting almost all examples as class 0 from epoch 1 and beyond. Also, investigating my dataset, both training and validation are very unbalanced for both classes, with 77% of examples in class 0 and 23% in class 1. Maybe the LSTM is overfitting and predicting only class 0 ?

I gonna try the pos_weight of the BCEWithLogitsLoss and see what happens.