I am looking for a pytorch implementation of an RNN module with variational dropout (= SAME dropout mask at each timestep AND recurrent layers) as proposed by Gal and Ghahramani in the paper A Theoretically Grounded Application of Dropout in Recurrent Neural Networks https://papers.nips.cc/paper/6241-a-theoretically-grounded-application-of-dropout-in-recurrent-neural-networks.pdf .
Would one of the approaches from this topic work or are you looking for another implementation?
If it gives you any trouble I’d be happy to try and help.
@ptrblck @Iridium_Blue thank you both for your reply. I saw the Better_LSTM_Pytorch code, and I was wondering if this is indeed a correct implementation of recurrent dropout.
Applies the same dropout mask across the temporal dimension See https://arxiv.org/abs/1512.05287 for more details. Note that this is not applied to the recurrent activations in the LSTM like the above paper. Instead, it is applied to the inputs and outputs of the recurrent layer.
It’s not a literal application of the paper, no. Better_LSTM changes the dropout strategy from in a natural way - which I have found works much better for NLP anyway.
For a faithful implementation of the paper see AWD-LSTM. Here’s a great writeup on it I found - https://yashuseth.blog/2018/09/12/awd-lstm-explanation-understanding-language-model/