Multilayer LSTM per-layer hidden states behavior

Qwinpin · November 25, 2021, 8:46am

I have a question about the behavior of hidden and cell states in multilayer LSTM module.

torch.nn.LSTM layer with num_layers > 1 starts each internal layer with default h_0 and c_0 (which are zero matrix usually), so, we can simply replace:

lstm_m = torch.nn.LSTM(…, num_layers=2)
x = lstm_m(x)

with

lstm = torch.nn.LSTM(…, num_layers=1)
lstm2 = torch.nn.LSTM(…, num_layers=1)
x, (h, c) = lstm(x)
x = lstm2(x)

But in the second case, we can pass last states (h, c) as initial states of lstm2 in order to keep the flow:

x = lstm2(x, h_0=h, c_0=c)

I tried to find any information about the chosen behavior of multilayer LSTM: start each internal layer with its own default states, but I couldn’t. Is there some theory behind that or practical reasons?