Initialization of the hidden states of torch.nn.lstm

As the explanations of num_layers in this link:

if the output of hidden state of the first lstm is the input of the hidden state of the second lstm (number_layers=2 for torch.nn.lstm), why do we need to initialize the hidden state with the first dimension (representing the number of hidden states I suppose) being the num_layers? For example, the code below:

rnn = nn.LSTM(10, 20, 2)
input = torch.randn(5, 3, 10)
h0 = torch.randn(2, 3, 20)
c0 = torch.randn(2, 3, 20)
output, (hn, cn) = rnn(input, (h0, c0))

Each lstm layer needs the input, hidden and cell states. The first lstm layer provides its output denoting the hidden state to the input of the second lstm, while the second lstm still needs its hidden and cell state values. Therefore, there are two initial hidden states being defined at the beginning.