Building vanilla RNN, GRU and LSTM from PyTorch blocks

MrRobot · November 6, 2019, 7:42am

Problem

I am trying to understand how RNN, GRU and LSTM work. I saw a LSTM implementation by just using Numpy here. But it is too time-consuming to go over all those details since I currently just want to understand the algorithmic workflow of those modules (not how we calculate gradient, etc). So I would like to build those modules from PyTorch blocks like nn.Linear().

This is a LSTM diagram, which looks simple.

However, as easy as it might sound, I am not sure

How should I carry the cell state and hidden state along different time step.
Whether the models constructed this way could backprop correctly.
Is there any standardized dataset I could use to check my implementation is correct.

Could anyone help me? Thank you in advance!

tom · November 6, 2019, 8:08am

Looking at the PyTorch API for inspiration, you can keep it in a local variable and return it at the end.

Whether the models constructed this way could backprop correctly.

Yes.

Is there any standardized dataset I could use to check my implementation is correct.

The usual way is to feed random data through your module and then compare it to a reference implementation (e.g. PyTorch’s or the Numpy one you saw if you prefer).

Best regards

Thomas