As the explanations of num_layers in this link: https://discuss.pytorch.org/t/what-is-num-layers-in-rnn-module/9843
if the output of hidden state of the first lstm is the input of the hidden state of the second lstm (number_layers=2 for torch.nn.lstm), why do we need to initialize the hidden state with the first dimension (representing the number of hidden states I suppose) being the num_layers? For example, the code below:
rnn = nn.LSTM(10, 20, 2)
input = torch.randn(5, 3, 10)
h0 = torch.randn(2, 3, 20)
c0 = torch.randn(2, 3, 20)
output, (hn, cn) = rnn(input, (h0, c0))