Dropout in nn.RNN

does it use the same dropout mask for every timestep? if not how to make it work that way while still maintaining the same performance as using nn.RNN?

This type of dropout is better according to the following paper, and it is also the dropout used in keras.


During testing, the dropout is turned off. The performance would not be affected by the dropout.

No, it doesn’t, because PyTorch’s RNNs are thin wrappers around cuDNN, which doesn’t support time-locked dropout masks. Users can implement it themselves, though at the cost of reduced speed due to inability to use the optimized cuDNN kernel.