I am planning to use a GRU in the following real-time scenario but having doubts since I have not really understood the effect of the sequence length of the input data.
The scenario:
At training, the GRU would learn offline on recorded data from a sensor.
At test time, the data samples from the sensor are coming in one after another at a high frequency.
Because I would like to make a prediction directly after a sample has arrived,
I would set the sequence length during test and training time to 1.
But I am not sure if by doing so the GRU would lose all or part of its “memory” abilities.
Actually, I assume that the “memory” of the GRU does not depend on the sequence length but on the hidden state? The hidden state in turn depends on the learned weights and therefore it would make sense to use a sequence length of 1? If that is true, what is actually the benefit of having a sequence length >1 ?
I would really appreciate it if a more experienced PyTorch user could give me clarification on this.
Thanks in advance.