Dear experienced friends, I am trying to split one LSTM model into three sub-models, and I hope it could somehow improve the model performance. Suppose the shape of our original data is [100, 7, 6]
.
- Firstly, I split data into 2 sub-dataset as
[100, 7, 3]
. - Then I feed them into two mini-LSTM models and concatenate the outputs.
- Finally, I feed the combination of outputs as the input to the final LSTM model.
The code is shown below. After I get the loss, I just backpropagate it to tune all the parameters in three models. May I ask two questions:
- I initialize the
h0_3, c0_3
as zero for the final LSTM model. Should I use theh0,c0
from previous mini-models (hn_1, cn_1
,hn_2, cn_2
)? - If I just backpropagate the loss and tune all the parameters in three models, is this the same as tuning the parameters in three connected LSTM layers in sequential models? Like
model = Sequential()
model.add(LSTM(n1, return_sequences=True))
model.add(LSTM(n2, return_sequences=True))
model.add(LSTM(n3, return_sequences=False))
Any suggestions would be appreciated. Thank you so much in advance!
# (input_size,hidden_size,num_layers)
lstm_1 = nn.LSTM(3, 10, 2, batch_first=True)
lstm_2 = nn.LSTM(3, 10, 2, batch_first=True)
lstm_3 = nn.LSTM(20, 20, 2, batch_first=True)
# (batch_size, seq length, fea_number)
x = torch.randn(100, 7, 6)
# split into 2 --> (100,7,3)
x1,x2 = torch.split(x, 3, dim=-1)
# init h0,c0 (num_layers*num_directions, batch, hidden_size)
# model 1
h0_1 = torch.randn(2, 100, 10)
c0_1 = torch.randn(2, 100, 10)
output_1,(hn_1, cn_1) = lstm_1(x1, (h0_1, c0_1))
# model 2
h0_2 = torch.randn(2, 100, 10)
c0_2 = torch.randn(2, 100, 10)
output_2, (hn_2, cn_2) = lstm_2(x2, (h0_2, c0_2))
# concat two outputs
x3 = torch.cat((output_1, output_2),dim=-1)
h0_3 = torch.randn(2, 100, 20)
c0_3 = torch.randn(2, 100, 20)
output_3, (hn_3, cn_3) = lstm_3(x3, (h0_3, c0_3))