nn.GRU last output and h_n (state) doesn't match

dovah · March 31, 2020, 7:25pm

Hi,
I’m having troubles to understand the relationship between the output of nn.GRU and h_n (last state).

Per documentation:

output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features h_t from the last layer of the GRU, for each t.

h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len

The documentation additionally specifies how to separate the various dimensions properly, so I followed that hint and made this script:

with torch.no_grad():
  rnn = nn.GRU(16, 16, num_layers=2, bidirectional=True)
  input = torch.randn(5, 10, 16)
  memory, state = rnn(input)

  memory_view =\
    memory.view(
      5,
      10,
      2,
      16)

  state_view =\
    state.view(
      2,
      2,
      10,
      16)

  # True
  print(torch.allclose(memory_view[-1, 0, 0, :], state_view[-1, 0, 0, :]))
  # False
  print(torch.allclose(memory_view[-1, 0, 1, :], state_view[-1, 1, 0, :]))

How come the backward part of the last memory vector doesn’t match the backward part of state?
It’s not clear to me where I’m mistaking and I hope you’ll kindly help me out this confusion.
Thanks