Variational dropout RNN (Gal NeurIPS 2016)

I am looking for a pytorch implementation of an RNN module with variational dropout (= SAME dropout mask at each timestep AND recurrent layers) as proposed by Gal and Ghahramani in the paper A Theoretically Grounded Application of Dropout in Recurrent Neural Networks https://papers.nips.cc/paper/6241-a-theoretically-grounded-application-of-dropout-in-recurrent-neural-networks.pdf .

2 Likes

Would one of the approaches from this topic work or are you looking for another implementation? :slight_smile:

As @ptrblck indicated, I’ve been using the LSTM implementation with much success for the last few months, Variational dropout? .

If it gives you any trouble I’d be happy to try and help.

@ptrblck @Iridium_Blue thank you both for your reply. I saw the Better_LSTM_Pytorch code, and I was wondering if this is indeed a correct implementation of recurrent dropout.
VariationalDropout class: Applies the same dropout mask across the temporal dimension See https://arxiv.org/abs/1512.05287 for more details. Note that this is not applied to the recurrent activations in the LSTM like the above paper. Instead, it is applied to the inputs and outputs of the recurrent layer.

It’s not a literal application of the paper, no. Better_LSTM changes the dropout strategy from in a natural way - which I have found works much better for NLP anyway.

For a faithful implementation of the paper see AWD-LSTM. Here’s a great writeup on it I found - https://yashuseth.blog/2018/09/12/awd-lstm-explanation-understanding-language-model/

Thank you @Iridium_Blue. I started implementing it myself based on the AWD-LSTM code. You may find my implementation for the paper here.

1 Like