Initialization of the hidden states of torch.nn.lstm

Mingming_Qiu · March 26, 2022, 1:17pm

As the explanations of num_layers in this link: https://discuss.pytorch.org/t/what-is-num-layers-in-rnn-module/9843

if the output of hidden state of the first lstm is the input of the hidden state of the second lstm (number_layers=2 for torch.nn.lstm), why do we need to initialize the hidden state with the first dimension (representing the number of hidden states I suppose) being the num_layers? For example, the code below:

rnn = nn.LSTM(10, 20, 2)
input = torch.randn(5, 3, 10)
h0 = torch.randn(2, 3, 20)
c0 = torch.randn(2, 3, 20)
output, (hn, cn) = rnn(input, (h0, c0))

Mingming_Qiu · March 28, 2022, 9:43pm

Each lstm layer needs the input, hidden and cell states. The first lstm layer provides its output denoting the hidden state to the input of the second lstm, while the second lstm still needs its hidden and cell state values. Therefore, there are two initial hidden states being defined at the beginning.

Lei_Shi1 · October 24, 2022, 7:29pm

Hi Qiu,
I have the same confusion as yours. But I think the output of the 1st layer, i.e h1, is the initial hidden state rather than input of the 2nd layer. And the input of 2nd layer has nothing with hidden state values.

Mingming_Qiu · January 25, 2023, 2:27am

I think the example that I have given does not correspond to the figures that you’ve provided where at each time step there is only one lstm cell, but corresponds to the situation where there are two lstm cells for a lstm. As shown in the figure below, the two initial hidden states are respectively defined for the lstm cell in yellow and red in the first column which corresponds to the initialization.