Why is nn.GRU designed to only accept inputs of shape (1, x, input_size)? What does the second dimension mean?
If an input variable of dimension (1, input_size) together with a hidden state of dimension (1, hidden_size) are fed into nn.GRU, the following error will occur:
RuntimeError: matrices expected, got 1D, 2D tensors at /py/conda-bld/pytorch_1493680494901/work/torch/lib/TH/generic/THTensorMath.c:1232
Only after adding another dimension in the middle will the code runs.
If it takes a tensor of shape (batch_size, seq_len, input_size), then it is fine. But when I open its code, it seems to accept the tensor of shape (seq_len, batch_size, input_size) which is a big problem. Can anyone confirm if I am right?
The default is (seq_len, batch_size, input_size) by default, but you can specify batch_first=True and use (batch_size, seq_len, input_size), so it is not a big problem unless you forget the parameter.
The reason for the default is that the RNN will iterate over the seq dimension, so it is efficient to access individual timestep, batched data, which is contiguous if you pass in contiguous tensors of form (seq_len, batch_size, input_size).