Putting different hidden size for multi layer LSTM

Rahul_Moozhikkal · June 13, 2019, 7:50am

Hi,

I was looking in to a way by which we could put different hidden in a 2 layer LSTM size using standard nn.LSTM

If we see the input arguments for nn.LSTM=(input_size, hidden_size, num_layers)

I see no documentation or could not find anything online where it explains in PyTorch how we could have a different hidden size for layer 1 and layer 2.When I tried around with the code to provide more than one input to input and hidden sizes for multi layer LSTM it doesnt seem to work.
This seems to be pretty straight forward in Keras using the argument units.

Eg: below code explicitly specify that the hidden unit in layer 2 will have size=50

model.add(LSTM(
input_shape=(sequence_length, number_features),
units=100)
model.add(LSTM(
input_shape=(sequence_length, number_features),
units=50)

2.Is there a way to stack LSTMs with anything similar to nn.Sequential().I understand that nn.Sequential does not work for LSTMs?

Thank You
Rahul M

yongen9696 · August 20, 2021, 5:09am

Similar question has been answered over here.

Example answer from the link:

nn.Sequential(OrderedDict([
    ('LSTM1', nn.LSTM(input_size, hidden_size, 1),
    ('LSTM2', nn.LSTM(hidden_size, hidden_size, 1)
]))

Also, as Pytorch RNN source code mentioned about if variable num_layers is 2, it will run 2 layers by stacking one on other one.

fjaraavila · October 20, 2021, 8:32am

This does not seem right. Given that the output of each LSTM layer is a tuple, not a Tensor object.
As a matter of fact if you try this and then pass a tensor through it, most likely it’ll give you back the following error:

AttributeError: 'tuple' object has no attribute 'size'

fjaraavila · November 15, 2021, 12:41pm

After giving it a thought I think I managed to do it. Will leave it here in case someone might find it useful.

class LSTM_multi_output(nn.Module):
    def __init__(self, step_size, input_dimensions, first_hidden_size, second_hidden_size, third_hidden_size):
      super(LSTM_multi_output, self).__init__()
      self.first_hidden_size = first_hidden_size
      self.second_hidden_size = second_hidden_size
      self.third_hidden_size = third_hidden_size
      self.step_size = step_size
      self.input_dimensions = input_dimensions
      self.first_layer = nn.LSTM(input_size = self.input_dimensions, hidden_size = self.first_hidden_size, 
                                 num_layers = 1, batch_first = True)
      self.second_layer = nn.LSTM(input_size = self.first_hidden_size, hidden_size = self.second_hidden_size, 
                                  num_layers = 1, batch_first = True)
      self.third_layer = nn.LSTM(input_size = self.second_hidden_size, num_layers = 1, 
                                 hidden_size = selfthird_hidden_size, batch_first = True)
      self.fc_layer = nn.Linear(self.step_size*self.third_hidden_size, 4)
    def forward(self, x):
      batch_size, seq_len, _ = x.size()
      h_1 = torch.zeros(1, batch_size, self.first_hidden_size)
      c_1 = torch.zeros(1, batch_size, self.first_hidden_size)
      hidden_1 = (h_1, c_1)
      lstm_out, hidden_1 = self.first_layer(x, hidden_1)
      h_2 = torch.zeros(1, batch_size, self.second_hidden_size)
      c_2 = torch.zeros(1, batch_size, self.second_hidden_size)
      hidden_2 = (h_2, c_2)
      lstm_out, hidden_2 = self.second_layer(lstm_out, hidden_2)
      h_3 = torch.zeros(1, batch_size, self.third_hidden_size)
      c_3 = torch.zeros(1, batch_size, self.third_hidden_size)
      hidden_3 = (h_3, c_3)
      lstm_out, hidden_3 = self.third_layer(lstm_out, hidden_3)
      x = lstm_out.contiguous().view(batch_size,-1)
      return self.fc_layer(x)