LSTM network inside a Sequential container

I’ll start off by saying that I know very little about deep learning but still wanted to try and apply it to some of the work I’ve been doing.

Working through one of the tutorials, I built a NN made up of the following components:

model = torch.nn.Sequential( 
     torch.nn.Linear(D_in, H),
     torch.nn.Linear(H, D_out),

y_pred = model(train_x)

I wanted to use an LSTM network, so I tried to do the following:

model = torch.nn.Sequential(
      torch.nn.LSTM(D_in, H),
      torch.nn.Linear(H, D_out) 

y_pred = model(train_x)

This gave me the following error:

RuntimeError: input must have 3 dimensions, got 2

I found that the input expected by an LSTM network is a bit different than a Linear transformation:

input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence

I was able to work around it by splitting my Sequential nn container into two layer, as well as reshaping my input/output to/from the LSTM layer like so:

layerA = torch.nn.LSTM(D_in, H)
layerB = torch.nn.Linear(H, D_out)

train_x = train_x.unsqueeze(0)
y_pred, (hn, cn) = layerA(train_x)
y_pred = y_pred.squeeze(0)
y_pred = layerB(y_pred)

However, simply getting it to work concerns me because I feel like I’m using the nn incorrectly. My questions are as follows:

  1. How can I use an LSTM network as part of a Sequential container?
  2. Why is the data to an LSTM network different from that to a Linear one? What is the significance of the outer most dimension?
  3. What is the correct way to use DataLoader in conjunction with an LSTM network? I’m using the default DataLoader, which doesn’t seem to play way with nn.LSTM:
    test_x_data = torch.FloatTensor(x)
    test_dataset = data_utils.TensorDataset(test_x_data, test_y_data)
    test_loader = data_utils.DataLoader(test_dataset, batch_size=1, shuffle=True)

This is a relatively open-ended question, so I appreciate your time in advance!

Hi @Olshansky!

I’ve been tackling a similar problem as you have in this post. I too tried to tackle my problem first by using the nn.Sequential container, but the problem lies in that the nn.LSTM outputs a tuple.

Let me thus share a mockup solution that utilizes torch.nn.ModuleDict and a custom forward function. I’ll however lay out the data first so that the transformations make sense to you:

# A batch is in shape [batches, sequences, sequence length, features]  
x=torch.Size([128, 30, 12, 45])
y=torch.Size([128, 30, 1, 1])

The most simple way to implement an LSTM coupled with a Linear layer I came up with is this:

class MockupModel(nn.Module):

    def __init__(self):
        self.model = nn.ModuleDict({
            'lstm': nn.LSTM(
                input_size=x_features,    # 45, see the data definition
                hidden_size=hidden_size,  # Can vary
            'linear': nn.Linear(

Then I proceeded to overwrite the forward function:

    def forward(self, x):

        # From [batches, seqs, seq len, features]
        # to [seq len, batch data, features]
        x = x.view(x_seq_len, -1, x_features)
        # Data is fed to the LSTM
        out, _ = self.model['lstm'](x)
        print(f'lstm output={out.size()}')

        # From [seq len, batch, num_directions * hidden_size]
        # to [batches, seqs, seq_len,prediction]
        out = out.view(x_batches, x_seqs, x_seq_len, -1)
        print(f'transformed output={out.size()}')

        # Data is fed to the Linear layer
        out = self.model['linear'](out)
        print(f'linear output={out.size()}')

        # The prediction utilizing the whole sequence is the last one
        y_pred = out[:, :, -1].unsqueeze(-1)

        return y_pred

Now if I initialize the model and feed a batch of data to it, the printouts of of the forward function produce the following output:

>>> model = MockupModel()
>>> model(x)
lstm output=torch.Size([12, 3840, 4])
transformed output=torch.Size([128, 30, 12, 4])
linear output=torch.Size([128, 30, 12, 1])
y_pred=torch.Size([128, 30, 1, 1])

This way my implementation produces predictions matching to the targets, which can then be fed to the loss function. I hope this helps you out, or anyone else with similar problem!



This is exactly what I was looking for. Thank you so much for the detailed answer!

1 Like
    l.LSTM(num_hidden, input_shape=features_shape),
    l.Dense(label_shape[0], activation='sigmoid')

It would be nice to have an arg that makes LSTM() only return a tensor. Alternatively, internal methods that need the non-tensor output could use different method to fetch it that info

# LSTM() returns tuple of (tensor, (recurrent state))
class extract_tensor(nn.Module):
    def forward(self,x):
        # Output shape (batch, features, hidden)
        tensor, _ = x
        # Reshape shape (batch, hidden)
        return tensor[:, -1, :]

nn.LSTM(inputSize, hiddenSize, 1, batch_first=True),
nn.Linear(hiddenSize, classe_n))