Why (1, 1, -1) in Seq2Seq model?

alvations · January 24, 2018, 1:28pm

In the PyTorch Seq2Seq tutorial, the embedding and hidden context vectors were forces into a (1, 1, -1) shape.

Is there a reason for the (1, 1, -1) shape? Why do we need a 1x1x* 3D vector to keep a single vector in the inner most tensor?

Why can’t we just use a (-1) 1-dimensional vector?

jpeg729 · January 24, 2018, 3:09pm

Because it then gets passed to a GRU which needs input of shape (seq_len, batch_size, features).

alvations · January 24, 2018, 3:15pm

Thanks for the info!

I understand that it’s doing SGD where batch_size=1.
But any idea why is the seq_len=1 in the case of the tutorial?

jpeg729 · January 24, 2018, 3:18pm

presumably because the elements of the sequence are fed to the GRU one at a time which is necessary given that at each timestep you want to feed the output of the previous timestep back in to the GRU.

alvations · January 24, 2018, 3:24pm

Thanks again for the clarification.

Are there instances where GRU take more than one input? If not, is seq_len always =1 for GRU?

jpeg729 · January 24, 2018, 4:17pm

In this case we need seq_len=1 because we need to use the output of the previous timestep as the input to the next. However when the input timesteps are available in advance you could feed it the entire sequence in one go.