For a data set with 600 time steps, this stackoverflow answer proposes the following training schema, where each line represents a batch with sequence_length
=5 that will be trained on an RNN model:
t=0 t=1 t=2 t=3 t=4 t=5 ... t=598 t=599
sample |---------------------|
sample |---------------------|
sample |-----------------
...
sample ----|
sample ----------|
I had naively assumed that this would be excessive (as the overlap will have the model seeing each data point around sequence_length
times), and thought that the following would be sufficient (say bptt sequence_length
is 3 for convenience):
t=0 t=1 t=2 t=3 t=4 t=5 t=6 t=7 ... t=598 t=599
sample |-----------|
sample |-----------|
sample |-------
...
sample -----------|
The first schema now makes sense to me, as it is the only way the model will be able to see each transition between time stops at least once. If I read correctly, it also looks like get_batch
does this in the word_language_model example. I just wanted to verify that this is the way we should be training sequential data.
Thanks