Lstm binary text classification model same prediction

One more thing you can try; that is applying nonlinear activations over linear layers. Here is the discussion which can be helpful. :slight_smile:

Link: Loss does not change and weights remain zero - #6 by ptrblck