for i in inputs: # Step through the sequence one element at a time. # after each step, hidden contains the hidden state. out, hidden = lstm(i.view(1, 1, -1), hidden)
For me, it seems like handling the LSTM in this way breaks the computational graph as hidden keeps on getting overridden. Should all the hidden states not be stored in an array so the computational graph can be maintained, so backprop can flow through the hidden states?