As shown in the documentation: http://pytorch.org/docs/0.2.0/nn.html#torch.nn.GRU
The GRU outputs output (seq_len, batch, hidden_size * num_directions): tensor containing the output features h_t from the last layer of the RNN, for each t.
and
h_n (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t=seq_len
For GRU with n_layers = 2
and bi_directional = True
,
Then I assume that the backward hidden state at the last position output[-1,:,hidden_size:]
is equal to h_n[3,:,:],
but in my code that is not true.
I have tried h_n[0,:,:]
, h_n[1,:,:]
and h_n[2,:,:]
as well, but none is equal to output[-1,:,hidden_size:]
Why? (I need forward hidden state at last position to initialize decoder hidden state in my Seq2Seq model)
Thanks in advance.