Appending a recurrent layer to PyTorch LSTM model with different hidden size

I’m developing a BI-LSTM model for sequence analysis using PyTorch. For which I am using torch.nn.LSTM . Using that module, you can have several layers with just passing a parameter num_layers to be the number of layers (e.g., num_layers=2 ). However all of them will have the same hidden_size which is partially fine for me, I just want to have all of them the same hidden_size but the last layer with a different size. Basic example follows:

rnn = nn.LSTM(input_size=10, hidden_size=20, num_layers=2)
inp = torch.randn(5, 3, 10)
h0 = torch.randn(2, 3, 20)
c0 = torch.randn(2, 3, 20)
output, (hn, cn) = rnn(inp, (h0, c0))

The output dim is ( 5, 3, 20 )

One solution (But unfavorable to me) is implementing extra model that outputs the dimension I need and takes the input from the first model, e.g.,:

rnn_two = nn.LSTM(input_size=20, hidden_size=2)
output2, _ = rnn_two(output)

Same as this solution. However, I do not wanna do this because I parallelize the model using DataParallel, so I need all to be one package. I was hoping to find something similar to keras, e.g.,:

rnn.add(LSTM, hidden_size=2)

I have checked the LSTM source code but couldn’t find what I need.

Any suggestions?

you can collect these two LSTMs into a single class and use DataParallel on it.
Does that help?

import torch.nn as nn
class TwoLSTMs(nn.Module):
    def __init__(self):
        self.rnn_one = nn.LSTM(input_size=10, hidden_size=20, num_layers=2)
        self.rnn_two = nn.LSTM(input_size=20, hidden_size=2)
    def forward(self, inp, h0, c0):
        output, (hn, cn) = rnn(inp, (h0, c0))
        output2, _ = rnn_two(output)
        return output2

model = TwoLSTMs()
model = nn.DataParallel(model)

P.S: I did not check for syntax errors

Yes this solves my problem. However, you sure this design does not have side effects in parallelism or theoretically (e.g., backward)?

IMHO, By default, LSTM is a sequential architecture. Whatever parallelism that you achieve by nn.DataParallel is to speed it up by using multiple-GPUs. i.e., for higher batch sizes, but it is still sequential.