Why no Dropout in last layer of RNN

What is the reason behind this restriction?

In the documentation for all recurrent layers is written:

dropout – If non-zero, introduces a Dropout layer on the outputs of each RNN layer except the last layer

But why? Is it an implementation issue? Or is there research on this topic?

When using only 1 LSTM layer I would not be able to use dropout, but it helps performance (when implemented manually) for my (time series forecasting) problem.

Thank you very much

So the dropout in the last layer would be operating on what is the output of the RNN.
This means you can do it yourself on the output if needed, an option you don’t have for the inner layers.
Note that the dropout implemented by the RNN is not the dropout using one random draw for all timesteps.

Best regards


thanks that makes sense!