Understanding output of lstm

The common nomenclature is rather:


output, (hidden, cell) = lstm(input, (hidden, cell) )

if you are using nn.LSTM, I assum you are stacking more than one layers of LSTM. In that case:

This is not exact.

Let say you have 2 layers (L1 and L2). L1 has 3 inputs: (input, (h1, c1)). input is the input for the whole stacked architecture (the same as in the python line above). L1 has 2 outputs: (h1_, c1_), the updated hidden and cell for layer 1. Then, L2 has 3 inputs: (h1_, (h2, c2)), and 2 outputs: (h2_, c2_). The final output for the whole stacked architecture is h2_.

If we re-write the python line above with these names, it would be:

h2_, ( [h1_,h2_], [c1_,c2_] ) = lstm( input, ( [h1, h2], [c1, c2] ) )
10 Likes