Why does nn.GRU accept inputs of shape (1, x, input_size)

Why is nn.GRU designed to only accept inputs of shape (1, x, input_size)? What does the second dimension mean?

If an input variable of dimension (1, input_size) together with a hidden state of dimension (1, hidden_size) are fed into nn.GRU, the following error will occur:

RuntimeError: matrices expected, got 1D, 2D tensors at /py/conda-bld/pytorch_1493680494901/work/torch/lib/TH/generic/THTensorMath.c:1232

Only after adding another dimension in the middle will the code runs.

I’m very new to PyTorch but from the documentation I see that when initializing nn.GRU takes (input_size, hidden_size, num_layers).

And regarding the inputs the dimensions seem to correspond to: (batch_size, seq_len, input_size)

Adapting a bit from the examples:

batch_size = 5
input_size = 10
num_layers = 2
hidden_size = 20
seq_len = 3
rnn = nn.GRU(input_size, hidden_size, num_layers)
inp = Variable(torch.randn(batch_size, seq_len, input_size))
h0 = Variable(torch.randn(num_layers, seq_len, hidden_size))
output, hn = rnn(inp, h0)

So the second dimension is the sequence length. This is the behaviour you can see in almost all recurrent modules / layers of every DL library.


If it takes a tensor of shape (batch_size, seq_len, input_size), then it is fine. But when I open its code, it seems to accept the tensor of shape (seq_len, batch_size, input_size) which is a big problem. Can anyone confirm if I am right?

The default is (seq_len, batch_size, input_size) by default, but you can specify batch_first=True and use (batch_size, seq_len, input_size), so it is not a big problem unless you forget the parameter.
The reason for the default is that the RNN will iterate over the seq dimension, so it is efficient to access individual timestep, batched data, which is contiguous if you pass in contiguous tensors of form (seq_len, batch_size, input_size).

Best regards