I have a question about the behavior of hidden and cell states in multilayer LSTM module.

torch.nn.LSTM layer with *num_layers* > 1 starts each internal layer with default *h_0* and *c_0* (which are zero matrix usually), so, we can simply replace:

lstm_m = torch.nn.LSTM(…, num_layers=2)

x = lstm_m(x)

with

lstm = torch.nn.LSTM(…, num_layers=1)

lstm2 = torch.nn.LSTM(…, num_layers=1)

x, (h, c) = lstm(x)

x = lstm2(x)

But in the second case, we can pass last states *(h, c)* as initial states of *lstm2* in order to keep the flow:

x = lstm2(x, h_0=h, c_0=c)

I tried to find any information about the chosen behavior of multilayer LSTM: start each internal layer with its own default states, but I couldn’t. Is there some theory behind that or practical reasons?