I’m having troubles to understand the relationship between the output of nn.GRU and h_n (last state).
output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features h_t from the last layer of the GRU, for each t.
h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len
The documentation additionally specifies how to separate the various dimensions properly, so I followed that hint and made this script:
with torch.no_grad(): rnn = nn.GRU(16, 16, num_layers=2, bidirectional=True) input = torch.randn(5, 10, 16) memory, state = rnn(input) memory_view =\ memory.view( 5, 10, 2, 16) state_view =\ state.view( 2, 2, 10, 16) # True print(torch.allclose(memory_view[-1, 0, 0, :], state_view[-1, 0, 0, :])) # False print(torch.allclose(memory_view[-1, 0, 1, :], state_view[-1, 1, 0, :]))
How come the backward part of the last
memory vector doesn’t match the backward part of
It’s not clear to me where I’m mistaking and I hope you’ll kindly help me out this confusion.