Trouble with applying BPTT with truncation in LSTM model with overlapping sequences

Hey, I’m trying to do the following:
I have a single N input data samples of dimension of 1 and a corresponding N output samples with dimension 1.
I want to train an LSTM model that is fed with a sequence of say 10 samples with overlapping, i.e:
in iteration 1 we feed: x[1:11] and perform BPTT and parameters update.
in iteration 2 we feed: x[2:12] and perform BPTT and parameters update.
in iteration 3 we feed: x[3:13] and perform BPTT and parameters update. etc…
Now, I want the second hidden state & cell state of each iteration to be fed as the initial hidden state and cell state values for the next iteration.
The second hidden state is available by calling:

output, (hn, cn) = self.lstm_layer(input, [self.hidden_state, self.cell_state]) 

Here ‘output’ holds all states of the sequence.
But this doesn’t get me the second cell state as well for each iteration.
Is there any way to overcome this? Is the cell state not necessary for proper learning and zeroing it will do?

  • You could apply the first step separately.
  • Most times I have seen this, people used non-overlapping samples an cached the states (see e.g. fast.ai’s lecture on language models).

Best regards

Thomas

1 Like