As shown in the documentation: http://pytorch.org/docs/0.2.0/nn.html#torch.nn.GRU
The GRU outputs output (seq_len, batch, hidden_size * num_directions): tensor containing the output features h_t from the last layer of the RNN, for each t.
h_n (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t=seq_len
For GRU with
n_layers = 2 and
bi_directional = True,
Then I assume that the backward hidden state at the last position
output[-1,:,hidden_size:] is equal to
h_n[3,:,:], but in my code that is not true.
I have tried
h_n[2,:,:] as well, but none is equal to
Why? (I need forward hidden state at last position to initialize decoder hidden state in my Seq2Seq model)
Thanks in advance.