Original LSTM cell uses dropout that uses different mask at every time step which is ad-Hoc and it leads to unstable results.According this paper we should use same dropout masks at every time step. Variational RNN
Here is the screenshot what should ideally happen
Keras supports this with (dropout and recurrent dropout)
Is there any neat implementation for this pytorch? Thanks for Helping