Proper way to combine linear layer after LSTM

I would recommend the following-

  1. Printing out values of out and target tensor and ensure that you are comparing the right values.
  2. Trying to overfit to one (or a few) training example.

If you’re not able to overfit to a few examples, there’s probably something wrong with the code.