While reading the documentation for the GRU I read that: “output of shape (seq_len, batch, num_directions * hidden_size)”
When testing this my results seemed to have a different shape based on whether or not batch_first=True
The test was as follows:
gru_bf = nn.GRU(input_size=512,
hidden_size=1024,
num_layers=1,
batch_first=True,
bidirectional=False)
gru = nn.GRU(input_size=512,
hidden_size=1024,
num_layers=1,
batch_first=False,
bidirectional=False)
batch = 1
seq = 100
input_size = 512
in_bf = torch.randn(batch, seq, input_size)
input = torch.randn(seq, batch, input_size)
out_bf,_ = gru_bf(in_bf)
out, _ = gru(input)
print(out_bf.shape)
print(out.shape)
The output is :
torch.Size([1, 100, 1024])
torch.Size([100, 1, 1024])
It seems that if batch_first=True the output will have shape [batch, seq_len, hidden_size * n_dir]
else it will have the shape specified by the documentation of [seq_len, batch, hidden_size * n_dir]
Is this indeed what is happening or am I misunderstanding something?
Thanks for the help!