GRU output shape seems to differ from documentation

While reading the documentation for the GRU I read that: “output of shape (seq_len, batch, num_directions * hidden_size)”

When testing this my results seemed to have a different shape based on whether or not batch_first=True

The test was as follows:

gru_bf = nn.GRU(input_size=512,
             hidden_size=1024,
             num_layers=1,
             batch_first=True,
             bidirectional=False)

gru = nn.GRU(input_size=512,
             hidden_size=1024,
             num_layers=1,
             batch_first=False,
             bidirectional=False)

batch = 1
seq = 100
input_size = 512

in_bf = torch.randn(batch, seq, input_size)
input = torch.randn(seq, batch, input_size)

out_bf,_ = gru_bf(in_bf)
out, _ = gru(input)

print(out_bf.shape)
print(out.shape)

The output is :

torch.Size([1, 100, 1024])
torch.Size([100, 1, 1024])

It seems that if batch_first=True the output will have shape [batch, seq_len, hidden_size * n_dir]
else it will have the shape specified by the documentation of [seq_len, batch, hidden_size * n_dir]

Is this indeed what is happening or am I misunderstanding something?
Thanks for the help!

Yes this is correct. batch_first just decides whether the batch should come first in the shape of the output or not. If you set it to True the batch size will be the first dimension. If false the batch_size will be the second dimension.

Alright, thank you very much.