Same implementation different results between Keras and PyTorch - lstm

@tom Thanks for the tips!

  1. Interesting suggestion, just to make sure, so i can try what you mean and learn from it: I should run the same training instance through the PyTorch and Keras implementations and verify that, let’s say the embeddings are the same (fastText - should give the same), then the LSTM hidden layer (it won’t be the same as to different initialization of course, so i assume the l2 distance between the 2 final hidden states?) or all of them?
  2. Both models use the SAME dataset (csv). Which was processed using pad_sequences with maximum length, etc…

The full code isn’t publicly available on github yet. It’s just me trying to make sense of it.
I see what you mean, would love some clarification of how to do it right (for the LSTM at least and in general) and i’ll report.
I’ll do the embedding layer in the meanwhile .