Hi,
I’m having troubles to understand the relationship between the output of nn.GRU and h_n (last state).
Per documentation:
output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features h_t from the last layer of the GRU, for each t.
h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len
The documentation additionally specifies how to separate the various dimensions properly, so I followed that hint and made this script:
with torch.no_grad():
rnn = nn.GRU(16, 16, num_layers=2, bidirectional=True)
input = torch.randn(5, 10, 16)
memory, state = rnn(input)
memory_view =\
memory.view(
5,
10,
2,
16)
state_view =\
state.view(
2,
2,
10,
16)
# True
print(torch.allclose(memory_view[-1, 0, 0, :], state_view[-1, 0, 0, :]))
# False
print(torch.allclose(memory_view[-1, 0, 1, :], state_view[-1, 1, 0, :]))
How come the backward part of the last memory
vector doesn’t match the backward part of state
?
It’s not clear to me where I’m mistaking and I hope you’ll kindly help me out this confusion.
Thanks