Padding Pre Versus Post LSTM Performance

I am finding that all else being the same, the performance of my LSTM model (many to one) is dramatically different depending on how the sequences are padded. Pre padding performance is good, post padding is garbage. Is this expected? I can post all the code, but first wanted to see if I was experiencing something that was known?

1 Like