Docs for LSTM and GRU

I think the docs for LSTM / GRU need to be modified:

I have noticed that students consistently misunderstand the definition of input_size - they think it means the length of the series. This confuses them, as they expect LSTM to work with variable length data.

Additionally, the “example” at the bottom of the page is not easy to follow.

I’ve seen over 100 people go through these docs at this point, and I would estimate less than 10% of them understand how to use the function after reading them.

Do you think a more detailed explanation of input_size would be beneficial or a change in the argument name (if this would be backwards compatibility breaking, it might not be easy to land)?
In any case, would you be interested in improving the docs as you seem to have a good idea which improvements could be made?