GRU output and h_n relationship

Peixiang_Zhong · January 23, 2018, 11:35am

As shown in the documentation: http://pytorch.org/docs/0.2.0/nn.html#torch.nn.GRU

The GRU outputs output (seq_len, batch, hidden_size * num_directions): tensor containing the output features h_t from the last layer of the RNN, for each t.

and

h_n (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t=seq_len

For GRU with n_layers = 2 and bi_directional = True,

Then I assume that the backward hidden state at the last position output[-1,:,hidden_size:] is equal to h_n[3,:,:], but in my code that is not true.

I have tried h_n[0,:,:], h_n[1,:,:] and h_n[2,:,:] as well, but none is equal to output[-1,:,hidden_size:]

Why? (I need forward hidden state at last position to initialize decoder hidden state in my Seq2Seq model)

Thanks in advance.

DrShushen · August 17, 2022, 4:48pm

As it turns out, this is answered here: