The documentation for the argument dropout
of the LSTM module here confused me. Here’s the quote:
If non-zero, introduces a dropout layer on the outputs of each RNN layer except the last layer
GRU uses the same documentation. Initially I assumed that meant supplying a value other than False
will implement a dropout layer with a default dropout probability.
Today I dug through the source code. I went through LSTM, RNN base, Module, THNN backends, Function backends, functions RNN, functions Autograd RNN, and finally functions Stacked RNN before I figured out that the argument is a float probability and not a boolean.
I did some searches here. I’m not the first person to become confused (I’m out of links, so I can’t specifically reference; search for “dropout in LSTM”). It wasn’t until I read through those posts that I figured out by “layer” it didn’t mean the last unwrapped layer but rather the last stacked layer. That is to say, using the arg when num_layers
is 1 actually did nothing. Possibly worthy of a warning…
I nearly opened an Issue on Github, but I didn’t feel that this was strictly a bug or feature request. Does anyone agree that this argument needs clearer docs? Any guidance on what to do about it?