Does a sequence length of 1 for GRU makes sense?

fcvm · February 10, 2022, 5:45pm

I am planning to use a GRU in the following real-time scenario but having doubts since I have not really understood the effect of the sequence length of the input data.

The scenario:
At training, the GRU would learn offline on recorded data from a sensor.
At test time, the data samples from the sensor are coming in one after another at a high frequency.
Because I would like to make a prediction directly after a sample has arrived,
I would set the sequence length during test and training time to 1.
But I am not sure if by doing so the GRU would lose all or part of its “memory” abilities.
Actually, I assume that the “memory” of the GRU does not depend on the sequence length but on the hidden state? The hidden state in turn depends on the learned weights and therefore it would make sense to use a sequence length of 1? If that is true, what is actually the benefit of having a sequence length >1 ?

I would really appreciate it if a more experienced PyTorch user could give me clarification on this.
Thanks in advance.

googlebot · February 10, 2022, 9:00pm

hidden state contains a fixed size summary of a variable length sequence. it is supposed to gradually change as information is accumulated with each timestep. the use mode that you’re describing is valid, if you carry a changing hidden state between evaluations, the outputs will be the same as with bulk multi-step evaluation, just available earlier.

fcvm · February 13, 2022, 10:34pm

Thanks for your answer, I just do not understand your explanation.
Could you please explain what do you mean with bulk multi-step evaluation?

googlebot · February 14, 2022, 1:11am

if you have N unprocessed sequence elements, you can feed them to a RNN at once, and get N intermediate states - that’s the usual way in training mode (though for some tasks you discard non-final states)

but under the hood elements are processed one by one (timesteps), so you can receive the same states in “realtime” mode.

I.e. single step mode looks like

h1=RNN(x0,h0)
h2=RNN(x1,h1)

and multi-step computes the same:

[h1,h2,…] =RNN([x0,x1,…], h0)