I can’t really be sure what the issue might be, but some things to note after looking at your code
-
You use a Bi-LSTM but use
lstm_out
for further processing. In case of a Bi-LSTM the last step of the forward pass is inlstm_out[:,-1]
while the last step of the backward pass is inlstm_out[:,0]
; see here. Not sure if that’s what you want. For a basic classification task usinglstm_out, hidden = self.lstm1(embeds,(h1, c1))
and then usinghidden
is more convenient. -
I’m actually not quite sure how or even why your current setup works. The shape of
lstm_out
is(batch_size, seq_len, num_directions*hidden_dim)
which you then give to the linear layers. Usually, the linear layers should get(batch_size, hidden_dim)
. I can see why it’shidden_size*2
since for you,num_directions=2
. But you still have theseq_len
dimension. What is the shape ofout
before you return it? -
Fake news detection is not a trivial task. In fact, your network will never learn what TRUE or FALSE is but was fake new most likely looks like. So don’t expect a accuracy of close to 100%.
-
Can you overfit your model – that is can get down the training loss down to 0 using a very small training dataset (e.g., just 5 genuine news and only 5 fake news). This is some kind of basic sanity check if your model is setup correctly.