Is there a reason not to use dropout with GRU models?

  • I have seen many examples of networks (without RNN/LSTM/GRU) which use dropout layers in order to reduce overfitting.
  • In contrast, I have hardly seen examples of networks with RNN/LSRM/GRU using dropouts ?
    (i.e nn.GRU( dropout=, …)
  1. Is there a reason not to use dropout when using RNN layers (GRU/LSTM) ?

  2. What is the advantage or disadvantage of RNN networks with or without dropout ?