I’m (reasonably) sure that your forward()
method is off. There are too many view()
calls that shouldn’t be needed, but you have them to hammer the dimensions of the tensors to the required shape. While the network won’t complain, it won’t learn anything either. See one problem with view()
here.
lstm_out
contains ALL the hidden states of the last LSTM layer and in both directions. I’m pretty sure that
lstm_out = lstm_out.contiguous().view(-1, self.hidden_dim)
messes up your tensor. What’s the shape of lstm_out
after that command. If the first dimension is not your batch size – which I doubt it is – lstm_out
is messed up.
While you can use lstm_out
, I would first go with the hidden state h
to make it a bit simpler. You might want to try the following changes to your forward()
method:
forward(...):
...
lstm_out, h = self.lstm(embedded_words)
# Get the the last hidden state of the last LSTM layer (see LSTM docs)
# Note the h[0] since h is a tuple for LSTMs
last_h = h[0].view(self.n_layers, 2, batch_size, self.hidden_dim)
# Combine the hidden states of forward and backward LSTM
final_h = last_h[0] + last_h[1]
# final_h should have the shape (batch_size, hidden_dim)
fc_out = self.fc(final_h)
# sigmoid_out = sigmoid_out.view(batch_size, -1) # Should not be needed anymore
...