Understanding output of lstm

Mika_S · January 15, 2018, 9:52am

While using lstm with bidirectional and 2 layers

hidden_vector, last_hidden = lstm(features)

last_hidden is a 2 tuple with size of each element as num_layers, batch_size, hidden_size. From what I understand, the first element in tuple is the output in forward direction and second element of tuple is the output in backward direction.

Is the last element in the leading dimension of each element in tuple the topmost hidden layer ? THat is, is the following the top most hidden layer:

top_most = last_hidden[0][-1]

alexis-jacq · January 15, 2018, 12:38pm

The common nomenclature is rather:


output, (hidden, cell) = lstm(input, (hidden, cell) )

if you are using nn.LSTM, I assum you are stacking more than one layers of LSTM. In that case:

This is not exact.

Let say you have 2 layers (L1 and L2). L1 has 3 inputs: (input, (h1, c1)). input is the input for the whole stacked architecture (the same as in the python line above). L1 has 2 outputs: (h1_, c1_), the updated hidden and cell for layer 1. Then, L2 has 3 inputs: (h1_, (h2, c2)), and 2 outputs: (h2_, c2_). The final output for the whole stacked architecture is h2_.

If we re-write the python line above with these names, it would be:

h2_, ( [h1_,h2_], [c1_,c2_] ) = lstm( input, ( [h1, h2], [c1, c2] ) )

Mika_S · January 17, 2018, 9:57am

In bidirectional lstm is h1 an array/tuple as well with 2 elements?

alexis-jacq · January 18, 2018, 9:04am

No, you just have to tell bidirectional=True while initializing the module, then, input/output structures are the same.

Mika_S · January 23, 2018, 8:39am

Is the output here a concatenation of the hidden vectors?

output, (hidden, cell) = lstm(input, (hidden, cell) )