RNN: output vs hidden state don't match up (my misunderstanding?)

vdw · July 12, 2023, 6:26am

I always still adhere the old documation that specifies how to split the D and num_layers dimensions

h_n = h_n.view(num_layers, num_directions, batch, hidden_size)

but I’m not sure if this is true, since the old docs also say that the output shape of h_n is (num_layers * num_directions, batch, hidden_size).

The latest docs give a shape of (num_directions x num_layers, batch, hidden_size) – note the switch in order of num_directions and num_layers. Unfortunately, the docs no longer specify the view() commend to correctly separate them. I’ve even made a post about it, but nobody replied.

In any case, the latest docs state that

h_n will contain a concatenation of the final forward and reverse hidden states

So h_n[0] will contain the forward and reverse hidden state for all sequences in you batch at position 0 – that is, the first hidden state of the forward direction and the last hidden state of the backward direction.

In other words, you cannot use the index like you described to split into forward and backward direction.