While reading the documentation for the GRU I read that: “output of shape (seq_len, batch, num_directions * hidden_size)”
When testing this my results seemed to have a different shape based on whether or not batch_first=True
The test was as follows:
gru_bf = nn.GRU(input_size=512, hidden_size=1024, num_layers=1, batch_first=True, bidirectional=False) gru = nn.GRU(input_size=512, hidden_size=1024, num_layers=1, batch_first=False, bidirectional=False) batch = 1 seq = 100 input_size = 512 in_bf = torch.randn(batch, seq, input_size) input = torch.randn(seq, batch, input_size) out_bf,_ = gru_bf(in_bf) out, _ = gru(input) print(out_bf.shape) print(out.shape)
The output is :
torch.Size([1, 100, 1024]) torch.Size([100, 1, 1024])
It seems that if batch_first=True the output will have shape [batch, seq_len, hidden_size * n_dir]
else it will have the shape specified by the documentation of [seq_len, batch, hidden_size * n_dir]
Is this indeed what is happening or am I misunderstanding something?
Thanks for the help!