Building vanilla RNN, GRU and LSTM from PyTorch blocks

Problem

I am trying to understand how RNN, GRU and LSTM work. I saw a LSTM implementation by just using Numpy here. But it is too time-consuming to go over all those details since I currently just want to understand the algorithmic workflow of those modules (not how we calculate gradient, etc). So I would like to build those modules from PyTorch blocks like nn.Linear().

This is a LSTM diagram, which looks simple.

However, as easy as it might sound, I am not sure

  • How should I carry the cell state and hidden state along different time step.
  • Whether the models constructed this way could backprop correctly.
  • Is there any standardized dataset I could use to check my implementation is correct.

Could anyone help me? Thank you in advance!

Looking at the PyTorch API for inspiration, you can keep it in a local variable and return it at the end.

  • Whether the models constructed this way could backprop correctly.

Yes.

  • Is there any standardized dataset I could use to check my implementation is correct.

The usual way is to feed random data through your module and then compare it to a reference implementation (e.g. PyTorch’s or the Numpy one you saw if you prefer).

Best regards

Thomas