Dear experienced friends, I am trying to split one LSTM model into three sub-models, and I hope it could somehow improve the model performance. Suppose the shape of our original data is
[100, 7, 6].
- Firstly, I split data into 2 sub-dataset as
[100, 7, 3].
- Then I feed them into two mini-LSTM models and concatenate the outputs.
- Finally, I feed the combination of outputs as the input to the final LSTM model.
The code is shown below. After I get the loss, I just backpropagate it to tune all the parameters in three models. May I ask two questions:
- I initialize the
h0_3, c0_3as zero for the final LSTM model. Should I use the
h0,c0from previous mini-models (
- If I just backpropagate the loss and tune all the parameters in three models, is this the same as tuning the parameters in three connected LSTM layers in sequential models? Like
model = Sequential() model.add(LSTM(n1, return_sequences=True)) model.add(LSTM(n2, return_sequences=True)) model.add(LSTM(n3, return_sequences=False))
Any suggestions would be appreciated. Thank you so much in advance!
# (input_size,hidden_size,num_layers) lstm_1 = nn.LSTM(3, 10, 2, batch_first=True) lstm_2 = nn.LSTM(3, 10, 2, batch_first=True) lstm_3 = nn.LSTM(20, 20, 2, batch_first=True) # (batch_size, seq length, fea_number) x = torch.randn(100, 7, 6) # split into 2 --> (100,7,3) x1,x2 = torch.split(x, 3, dim=-1) # init h0,c0 (num_layers*num_directions, batch, hidden_size) # model 1 h0_1 = torch.randn(2, 100, 10) c0_1 = torch.randn(2, 100, 10) output_1,(hn_1, cn_1) = lstm_1(x1, (h0_1, c0_1)) # model 2 h0_2 = torch.randn(2, 100, 10) c0_2 = torch.randn(2, 100, 10) output_2, (hn_2, cn_2) = lstm_2(x2, (h0_2, c0_2)) # concat two outputs x3 = torch.cat((output_1, output_2),dim=-1) h0_3 = torch.randn(2, 100, 20) c0_3 = torch.randn(2, 100, 20) output_3, (hn_3, cn_3) = lstm_3(x3, (h0_3, c0_3))