Predicted labels for lstm are all the same


I’ve created a 2-stage lstm based classification model, using mini-batches.
The stages are separate code files.

File 1:
I have checked the gradients, they are updating. The model seems to be working correct on one dataset. (Merged two labels into label 0, and the third label is label 1, hence a binary classification). Here both training and validation losses go down constantly, and after a point the validation loss stabilises/increases when it starts to overfit. I believe that this pattern is correct and that this classifier is working correctly. Please let me know if it isn’t.

In a second code file:
I take a smaller part of the same testing dataset, consisting of the outputs of the first classifier which were predicted to have label 0. This label consisted of two labels in the actual dataset. I now try to classify this data into those labels.

The problem that I am facing is that everything is getting classified into the same class now after a few epochs. I thought it was due to unbalanced data, but I used weights in the loss function to counter that. I have used dropout/weight decay to prevent overfitting as well. Nothing seems to work that well. Adding the regularisation parameters and decreasing the learning rate slows down the classification, but it ends up in the same class again.

Any suggestions what might be going wrong here? I can share the code if needed.

The dataset is pretty small in size, I hope that is not the issue. 3k training sentences, 1.3k testing.

(Marcin Elantkowski) #2

I guess it’ll be hard to say without seeing the code / data. Also, I’m not sure if LSTM is the right tool given small amount of data.

Either way, to clarify: you train LSTM on train data, predict 0 / 1 labels for the test data,
and then on the test data and predicted labels you train a new, seperate classifier?

Is this also an LSTM? Did you check the number of predicted 0s and 1s? And how many of those 0s are of merged-class-1, and how many belong to merged-class-2?

As a sidenote, you should not train anything on the test data.