Bi-LSTM acc doesn't improve over 65%

I can’t really be sure what the issue might be, but some things to note after looking at your code

  • You use a Bi-LSTM but use lstm_out for further processing. In case of a Bi-LSTM the last step of the forward pass is in lstm_out[:,-1] while the last step of the backward pass is in lstm_out[:,0]; see here. Not sure if that’s what you want. For a basic classification task using lstm_out, hidden = self.lstm1(embeds,(h1, c1)) and then using hidden is more convenient.

  • I’m actually not quite sure how or even why your current setup works. The shape of lstm_out is (batch_size, seq_len, num_directions*hidden_dim) which you then give to the linear layers. Usually, the linear layers should get (batch_size, hidden_dim). I can see why it’s hidden_size*2 since for you, num_directions=2. But you still have the seq_len dimension. What is the shape of out before you return it?

  • Fake news detection is not a trivial task. In fact, your network will never learn what TRUE or FALSE is but was fake new most likely looks like. So don’t expect a accuracy of close to 100%.

  • Can you overfit your model – that is can get down the training loss down to 0 using a very small training dataset (e.g., just 5 genuine news and only 5 fake news). This is some kind of basic sanity check if your model is setup correctly.