Appending a recurrent layer to PyTorch LSTM model with different hidden size

andrewnaguib · December 8, 2018, 8:02pm

I’m developing a BI-LSTM model for sequence analysis using PyTorch. For which I am using torch.nn.LSTM . Using that module, you can have several layers with just passing a parameter num_layers to be the number of layers (e.g., num_layers=2 ). However all of them will have the same hidden_size which is partially fine for me, I just want to have all of them the same hidden_size but the last layer with a different size. Basic example follows:

rnn = nn.LSTM(input_size=10, hidden_size=20, num_layers=2)
inp = torch.randn(5, 3, 10)
h0 = torch.randn(2, 3, 20)
c0 = torch.randn(2, 3, 20)
output, (hn, cn) = rnn(inp, (h0, c0))

The output dim is ( 5, 3, 20 )

One solution (But unfavorable to me) is implementing extra model that outputs the dimension I need and takes the input from the first model, e.g.,:

rnn_two = nn.LSTM(input_size=20, hidden_size=2)
output2, _ = rnn_two(output)

Same as this solution. However, I do not wanna do this because I parallelize the model using DataParallel, so I need all to be one package. I was hoping to find something similar to keras, e.g.,:

rnn.add(LSTM, hidden_size=2)

I have checked the LSTM source code but couldn’t find what I need.

Any suggestions?

InnovArul · December 9, 2018, 7:27am

you can collect these two LSTMs into a single class and use DataParallel on it.
Does that help?

import torch.nn as nn
class TwoLSTMs(nn.Module):
    def __init__(self):
        self.rnn_one = nn.LSTM(input_size=10, hidden_size=20, num_layers=2)
        self.rnn_two = nn.LSTM(input_size=20, hidden_size=2)
   
    def forward(self, inp, h0, c0):
        output, (hn, cn) = rnn(inp, (h0, c0))
        output2, _ = rnn_two(output)
        return output2


model = TwoLSTMs()
model = nn.DataParallel(model)

P.S: I did not check for syntax errors

andrewnaguib · December 9, 2018, 12:16pm

Yes this solves my problem. However, you sure this design does not have side effects in parallelism or theoretically (e.g., backward)?

InnovArul · December 9, 2018, 12:20pm

IMHO, By default, LSTM is a sequential architecture. Whatever parallelism that you achieve by nn.DataParallel is to speed it up by using multiple-GPUs. i.e., for higher batch sizes, but it is still sequential.