Hi guys! It is some months that I’ve moved from TF to Pytorch. While I am enjoying speed and flexibility, I am struggling in replicating results of one of my previous TF works in Pytorch. Specifically, I am talking about a seq2seq model (which I am now extending with attention, but let’s forget about this). I’ve fixed the “basic” discrepancy given by different weights initialization. My major concern is about dropout. As you might now, TF implements two different (variational) dropouts:

- dropout - Float between 0 and 1. Fraction of the units to drop for the linear transformation of the inputs. Default: 0.
- recurrent_dropout - Float between 0 and 1. Fraction of the units to drop for the linear transformation of the recurrent state. Default: 0.

default `nn.LSTM`

Pytorch implementation is, as far as I’ve understood, completely different. In fact, I am not able to reproduce the same results I had in TF just using standard Pytorch dropout (which should not be variational). Using LSTM implementation Better_LSTM_PyTorch/model.py at master · keitakurita/Better_LSTM_PyTorch · GitHub, as suggested in Variational dropout? - #9 by Iridium_Blue I get a huge performance improvement, even if I am still struggling in replicating the results that I got in TF.

So, my question: Is anyone aware of an implementation of recurrent layers (so, not just LSTM) that is very close to the TF one?

I think it should be not so much difficult to extend Variational dropout? - #9 by Iridium_Blue implementation to GRU and RNN, but I am anyhow interested in other implantations, if available.