I was trying to adapt the example given in “Sequence Models and Long-Short Term Memory Networks” to classification, and I noticed a small bug/omission: after training, the model hidden layer should be zero’d before using the model for prediction. Otherwise, the model used remembers the last hidden layers of the last training example.
model.hidden = model.init_hidden() # add this line
model.zero_grad() # not sure if this is necessary too.
inputs = prepare_sequence(training_data[0][0], word_to_ix)
tag_scores = model(inputs)
print(tag_scores)
Hi smth, I am wondering why you think this bug is minor?
I implemented my own LSTM, following the structure of the tutorial mentioned above. I noticed that if I do not zero the hidden state and cell state, I sometimes get wrong tagging. It is weird though that the original tutorial seems to always give the correct tagging, even though the states are not cleared before prediction.
After zeroing the states, I can always get the correct answer. Also I monitored the loss function and it seems to be decreasing reasonably.