I’m training an lstm network on a long series by deviding it to subseries.
Let’s say my series is a (f, 100) long (100 timesteps of f features in each step), and I divide it into 20 subseries of 5 steps long each.
I want to look at two scenarios:
In the first one we want plug these subseries 10 at a time to an LSTM model (so we will run lstm twice…), and have every subseries follow the previous one’s last timestep (i.e. [(s0,s1,s2,s3,s4), (s5,s6,s7,s8,s9), …]). So the initial hiddens (i.e (h0, c0)) will be of size (num_layers, num_batches, len_hidden). I then run the forward function on (input, (h0, c0)).
What confuses me is that I want (I think we all want) the hidden vector to be time specific so I need it to start each batch with hidden’s last value from the previous batch, but I don’t know if that is the case, because I do need to give an initial value to every batch… If not, is there a way to do that without looping and feeding one batch at a time?
The second scenario is similar : suppose I now take my subsequences to advance by one element each time (i.e. [(s1,s2,s3,s4,s5), (s2,s3,s4,s5,s6), (s3,s4,s5,s6,s7)…]) is there a way of making the second subseries’ h0 be the second subseries’ h1 (defining h0 to be (h0, c0))?
Third question is about whether I understand RNNs correctly:
In my view, the hidden layer should be timestep specific (at least from the point you want to start remembering) but I have seen some code in which the hidden was just initialized to zero at every batch, also I haven’t found any documentation or questions regarding the choices of hidden initialization across batches.