Dear experienced friends, I am trying to split one LSTM model into three sub-models, and I hope it could somehow improve the model performance. Suppose the shape of our original data is `[100, 7, 6]`

.

- Firstly, I split data into 2 sub-dataset as
`[100, 7, 3]`

. - Then I feed them into two mini-LSTM models and concatenate the outputs.
- Finally, I feed the combination of outputs as the input to the final LSTM model.

The code is shown below. After I get the loss, I just backpropagate it to tune all the parameters in three models. May I ask two questions:

- I initialize the
`h0_3, c0_3`

as zero for the final LSTM model. Should I use the`h0,c0`

from previous mini-models (`hn_1, cn_1`

,`hn_2, cn_2`

)? - If I just backpropagate the loss and tune all the parameters in three models, is this the same as tuning the parameters in three connected LSTM layers in sequential models? Like

```
model = Sequential()
model.add(LSTM(n1, return_sequences=True))
model.add(LSTM(n2, return_sequences=True))
model.add(LSTM(n3, return_sequences=False))
```

Any suggestions would be appreciated. Thank you so much in advance!

```
# (input_size,hidden_size,num_layers)
lstm_1 = nn.LSTM(3, 10, 2, batch_first=True)
lstm_2 = nn.LSTM(3, 10, 2, batch_first=True)
lstm_3 = nn.LSTM(20, 20, 2, batch_first=True)
# (batch_size, seq length, fea_number)
x = torch.randn(100, 7, 6)
# split into 2 --> (100,7,3)
x1,x2 = torch.split(x, 3, dim=-1)
# init h0,c0 (num_layers*num_directions, batch, hidden_size)
# model 1
h0_1 = torch.randn(2, 100, 10)
c0_1 = torch.randn(2, 100, 10)
output_1,(hn_1, cn_1) = lstm_1(x1, (h0_1, c0_1))
# model 2
h0_2 = torch.randn(2, 100, 10)
c0_2 = torch.randn(2, 100, 10)
output_2, (hn_2, cn_2) = lstm_2(x2, (h0_2, c0_2))
# concat two outputs
x3 = torch.cat((output_1, output_2),dim=-1)
h0_3 = torch.randn(2, 100, 20)
c0_3 = torch.randn(2, 100, 20)
output_3, (hn_3, cn_3) = lstm_3(x3, (h0_3, c0_3))
```