Does RNN `dropout` arg need better documentation?

flauted · March 24, 2018, 4:42pm

The documentation for the argument dropout of the LSTM module here confused me. Here’s the quote:

If non-zero, introduces a dropout layer on the outputs of each RNN layer except the last layer

GRU uses the same documentation. Initially I assumed that meant supplying a value other than False will implement a dropout layer with a default dropout probability.

Today I dug through the source code. I went through LSTM, RNN base, Module, THNN backends, Function backends, functions RNN, functions Autograd RNN, and finally functions Stacked RNN before I figured out that the argument is a float probability and not a boolean.

I did some searches here. I’m not the first person to become confused (I’m out of links, so I can’t specifically reference; search for “dropout in LSTM”). It wasn’t until I read through those posts that I figured out by “layer” it didn’t mean the last unwrapped layer but rather the last stacked layer. That is to say, using the arg when num_layers is 1 actually did nothing. Possibly worthy of a warning…

I nearly opened an Issue on Github, but I didn’t feel that this was strictly a bug or feature request. Does anyone agree that this argument needs clearer docs? Any guidance on what to do about it?

SimonW · March 24, 2018, 8:58pm

These are valid points. Feel free to open an issue or submit a PR on this