I can’t really be sure what the issue might be, but some things to note after looking at your code
-
You use a Bi-LSTM but use
lstm_outfor further processing. In case of a Bi-LSTM the last step of the forward pass is inlstm_out[:,-1]while the last step of the backward pass is inlstm_out[:,0]; see here. Not sure if that’s what you want. For a basic classification task usinglstm_out, hidden = self.lstm1(embeds,(h1, c1))and then usinghiddenis more convenient. -
I’m actually not quite sure how or even why your current setup works. The shape of
lstm_outis(batch_size, seq_len, num_directions*hidden_dim)which you then give to the linear layers. Usually, the linear layers should get(batch_size, hidden_dim). I can see why it’shidden_size*2since for you,num_directions=2. But you still have theseq_lendimension. What is the shape ofoutbefore you return it? -
Fake news detection is not a trivial task. In fact, your network will never learn what TRUE or FALSE is but was fake new most likely looks like. So don’t expect a accuracy of close to 100%.
-
Can you overfit your model – that is can get down the training loss down to 0 using a very small training dataset (e.g., just 5 genuine news and only 5 fake news). This is some kind of basic sanity check if your model is setup correctly.