LSTM not working properly on validation

vdw · June 21, 2020, 11:47pm

I’m (reasonably) sure that your forward() method is off. There are too many view() calls that shouldn’t be needed, but you have them to hammer the dimensions of the tensors to the required shape. While the network won’t complain, it won’t learn anything either. See one problem with view() here.

lstm_out contains ALL the hidden states of the last LSTM layer and in both directions. I’m pretty sure that

lstm_out = lstm_out.contiguous().view(-1, self.hidden_dim)

messes up your tensor. What’s the shape of lstm_out after that command. If the first dimension is not your batch size – which I doubt it is – lstm_out is messed up.

While you can use lstm_out, I would first go with the hidden state h to make it a bit simpler. You might want to try the following changes to your forward() method:

forward(...):
    ...
    lstm_out, h = self.lstm(embedded_words)
    # Get the the last hidden state of the last LSTM layer (see LSTM docs)
    # Note the h[0] since h is a tuple for LSTMs
    last_h = h[0].view(self.n_layers, 2, batch_size, self.hidden_dim)
    # Combine the hidden states of forward and backward LSTM
    final_h = last_h[0] + last_h[1]
    # final_h should have the shape (batch_size, hidden_dim)
    fc_out = self.fc(final_h)
    # sigmoid_out = sigmoid_out.view(batch_size, -1) # Should not be needed anymore
    ...