In the PyTorch Seq2Seq tutorial, the embedding and hidden context vectors were forces into a (1, 1, -1) shape.
Is there a reason for the (1, 1, -1) shape? Why do we need a 1x1x* 3D vector to keep a single vector in the inner most tensor?
Why can’t we just use a (-1) 1-dimensional vector?
Because it then gets passed to a GRU which needs input of shape (seq_len, batch_size, features).
Thanks for the info!
I understand that it’s doing SGD where batch_size=1.
But any idea why is the seq_len=1 in the case of the tutorial?
presumably because the elements of the sequence are fed to the GRU one at a time which is necessary given that at each timestep you want to feed the output of the previous timestep back in to the GRU.
Thanks again for the clarification.
Are there instances where GRU take more than one input? If not, is seq_len always =1 for GRU?
In this case we need seq_len=1 because we need to use the output of the previous timestep as the input to the next. However when the input timesteps are available in advance you could feed it the entire sequence in one go.