- I have seen many examples of networks (without RNN/LSTM/GRU) which use dropout layers in order to reduce overfitting.
- In contrast, I have hardly seen examples of networks with RNN/LSRM/GRU using dropouts ?
(i.e nn.GRU( dropout=, …)
-
Is there a reason not to use dropout when using RNN layers (GRU/LSTM) ?
-
What is the advantage or disadvantage of RNN networks with or without dropout ?