Bi-LSTM acc doesn't improve over 65%

I’m working using Bi-LSTM for fake news detection, however no matter what I do, my training acc is not improving over 65%, In the pre-processing I have removed stopwords, used lemenization and 4-grams, but nothing seems to improve the model, what can I do to improve the training acc

class RnnBiLSTM(nn.Module):
  def __init__(self, vocab_size, emb_size, hidden_size, drop_prob=0.5):
    super(RnnLSTM, self).__init__()
    self.hidden_size = hidden_size
    self.emb_size = emb_size
    self.emb = nn.Embedding(vocab_size, emb_size)
    self.lstm1 = nn.LSTM(emb_size, hidden_size, num_layers = 3,bidirectional= True 
    self.dropout = nn.Dropout(0.3)
    self.lin1 = nn.Linear(hidden_size*2, hidden_size//2) 
    self.lin2 = nn.Linear(hidden_size//2, 1)   
  def forward(self, x):
      x_size = x.shape[0]
      embeds = self.emb(x) 
      (h1, c1) = (torch.zeros(6, x_size, self.hidden_size), torch.zeros(6,  x_size, self.hidden_size)) 
      lstm_out, _ = self.lstm1(embeds,(h1, c1))
      out = self.dropout(lstm_out)
      out = torch.relu(self.lin1(out))
      out = torch.sigmoid(self.lin2(out))
      return out

I can’t really be sure what the issue might be, but some things to note after looking at your code

  • You use a Bi-LSTM but use lstm_out for further processing. In case of a Bi-LSTM the last step of the forward pass is in lstm_out[:,-1] while the last step of the backward pass is in lstm_out[:,0]; see here. Not sure if that’s what you want. For a basic classification task using lstm_out, hidden = self.lstm1(embeds,(h1, c1)) and then using hidden is more convenient.

  • I’m actually not quite sure how or even why your current setup works. The shape of lstm_out is (batch_size, seq_len, num_directions*hidden_dim) which you then give to the linear layers. Usually, the linear layers should get (batch_size, hidden_dim). I can see why it’s hidden_size*2 since for you, num_directions=2. But you still have the seq_len dimension. What is the shape of out before you return it?

  • Fake news detection is not a trivial task. In fact, your network will never learn what TRUE or FALSE is but was fake new most likely looks like. So don’t expect a accuracy of close to 100%.

  • Can you overfit your model – that is can get down the training loss down to 0 using a very small training dataset (e.g., just 5 genuine news and only 5 fake news). This is some kind of basic sanity check if your model is setup correctly.