Does RNN `dropout` arg need better documentation?

The documentation for the argument dropout of the LSTM module here confused me. Here’s the quote:

If non-zero, introduces a dropout layer on the outputs of each RNN layer except the last layer

GRU uses the same documentation. Initially I assumed that meant supplying a value other than False will implement a dropout layer with a default dropout probability.

Today I dug through the source code. I went through LSTM, RNN base, Module, THNN backends, Function backends, functions RNN, functions Autograd RNN, and finally functions Stacked RNN before I figured out that the argument is a float probability and not a boolean.

I did some searches here. I’m not the first person to become confused (I’m out of links, so I can’t specifically reference; search for “dropout in LSTM”). It wasn’t until I read through those posts that I figured out by “layer” it didn’t mean the last unwrapped layer but rather the last stacked layer. That is to say, using the arg when num_layers is 1 actually did nothing. Possibly worthy of a warning…

I nearly opened an Issue on Github, but I didn’t feel that this was strictly a bug or feature request. Does anyone agree that this argument needs clearer docs? Any guidance on what to do about it?

These are valid points. Feel free to open an issue or submit a PR on this :slight_smile: