Split one LSTM model into two LSTM. Should it perform better?

Dear experienced friends, I am trying to split one LSTM model into three sub-models, and I hope it could somehow improve the model performance. Suppose the shape of our original data is [100, 7, 6].

  • Firstly, I split data into 2 sub-dataset as [100, 7, 3].
  • Then I feed them into two mini-LSTM models and concatenate the outputs.
  • Finally, I feed the combination of outputs as the input to the final LSTM model.

The code is shown below. After I get the loss, I just backpropagate it to tune all the parameters in three models. May I ask two questions:

  • I initialize the h0_3, c0_3 as zero for the final LSTM model. Should I use the h0,c0 from previous mini-models (hn_1, cn_1, hn_2, cn_2)?
  • If I just backpropagate the loss and tune all the parameters in three models, is this the same as tuning the parameters in three connected LSTM layers in sequential models? Like
model = Sequential()
model.add(LSTM(n1, return_sequences=True))
model.add(LSTM(n2, return_sequences=True))
model.add(LSTM(n3, return_sequences=False))

Any suggestions would be appreciated. Thank you so much in advance!

# (input_size,hidden_size,num_layers)
lstm_1 = nn.LSTM(3, 10, 2, batch_first=True)
lstm_2 = nn.LSTM(3, 10, 2, batch_first=True)
lstm_3 = nn.LSTM(20, 20, 2, batch_first=True)

# (batch_size, seq length, fea_number)
x = torch.randn(100, 7, 6)

# split into 2 --> (100,7,3)
x1,x2 = torch.split(x, 3, dim=-1)

# init h0,c0 (num_layers*num_directions, batch, hidden_size)
# model 1
h0_1 = torch.randn(2, 100, 10)
c0_1 = torch.randn(2, 100, 10)
output_1,(hn_1, cn_1) = lstm_1(x1, (h0_1, c0_1))

# model 2
h0_2 = torch.randn(2, 100, 10)
c0_2 = torch.randn(2, 100, 10)
output_2, (hn_2, cn_2) = lstm_2(x2, (h0_2, c0_2))

# concat two outputs
x3 = torch.cat((output_1, output_2),dim=-1)
h0_3 = torch.randn(2, 100, 20)
c0_3 = torch.randn(2, 100, 20)

output_3, (hn_3, cn_3) = lstm_3(x3, (h0_3, c0_3))