Understanding output of lstm

While using lstm with bidirectional and 2 layers

hidden_vector, last_hidden = lstm(features)

last_hidden is a 2 tuple with size of each element as num_layers, batch_size, hidden_size. From what I understand, the first element in tuple is the output in forward direction and second element of tuple is the output in backward direction.

Is the last element in the leading dimension of each element in tuple the topmost hidden layer ? THat is, is the following the top most hidden layer:

top_most = last_hidden[0][-1]

The common nomenclature is rather:


output, (hidden, cell) = lstm(input, (hidden, cell) )

if you are using nn.LSTM, I assum you are stacking more than one layers of LSTM. In that case:

This is not exact.

Let say you have 2 layers (L1 and L2). L1 has 3 inputs: (input, (h1, c1)). input is the input for the whole stacked architecture (the same as in the python line above). L1 has 2 outputs: (h1_, c1_), the updated hidden and cell for layer 1. Then, L2 has 3 inputs: (h1_, (h2, c2)), and 2 outputs: (h2_, c2_). The final output for the whole stacked architecture is h2_.

If we re-write the python line above with these names, it would be:

h2_, ( [h1_,h2_], [c1_,c2_] ) = lstm( input, ( [h1, h2], [c1, c2] ) )
10 Likes

In bidirectional lstm is h1 an array/tuple as well with 2 elements?

No, you just have to tell bidirectional=True while initializing the module, then, input/output structures are the same.

Is the output here a concatenation of the hidden vectors?

output, (hidden, cell) = lstm(input, (hidden, cell) )