LSTM Output 2D not 3D As Expected


I’m building an LSTM model with PyTorch 1.2.0 and having a problem with the shape of the LSTM output when feeding it into the final linear layer in the forward method. This error below actually occurs downstream when calculating the loss. I’m losing the batch dimension for the predicted output of the model so it’s comparing a 2D prediction (seq & features) to a 3D data loader based tensor (batch, seq & features). FYI…I’m set batch_first=True in LSTM model initialization.

RuntimeError: Expected object of scalar type Float but got scalar type Int for argument #2 ‘target’

However, I think the root cause is I’m not getting a 3D return from the LSTM output in the forward method. I’m seeing a 2D shape returned (seq and features) and it’s missing the batches dimension. I’m sure it’s something I’m doing/not doing correctly upstream but I’ve been debugging this for hours and just can’t find it.

Greatly appreciate the help!

Here’s the code:

#Define LSTM Class
class LSTM(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim, batch_size, num_layers):
        super(LSTM, self).__init__()
        self.input_dim = input_dim
        self.hidden_dim = hidden_dim
        self.batch_size = batch_size
        self.num_layers = num_layers

        self.lstm = nn.LSTM(self.input_dim, self.hidden_dim, self.num_layers, batch_first=True)
        self.linear = nn.Linear(self.hidden_dim, output_dim)

    def init_hidden(self):
        return (torch.zeros(self.num_layers, self.batch_size, self.hidden_dim),
                torch.zeros(self.num_layers, self.batch_size, self.hidden_dim))

    def forward(self, input):
        lstm_out, self.hidden = self.lstm(input)
        # tried several things to make this work before finding out the LSTM output was 2D (i.e. view, reshape, contiguous, etc.) No luck.
        y_pred = F.relu(self.linear(lstm_out[-1]))
        return y_pred

#Instantiate LSTM model
input_size = 513
hidden_size = 100
num_layers = 2
output_size = 513

lstm_model = LSTM(input_size, hidden_size, output_size, train_batch_size, num_layers)

  (lstm): LSTM(513, 100, num_layers=2, batch_first=True)
  (linear): Linear(in_features=100, out_features=513, bias=True)

#Train model
learning_rate = 0.001
num_epochs = 5
loss_fn = torch.nn.MSELoss(size_average=False)
optimiser = torch.optim.Adam(lstm_model.parameters(), lr=learning_rate)

for i in range(num_epochs):

    lstm_model.hidden = lstm_model.init_hidden()
    for X_train_dl, y_train_dl in train_data_loader:  
        y_pred = lstm_model(X_train_dl.cuda())
        loss = loss_fn(y_pred, y_train_dl.cuda())

Jupyter Notebook 6.0.1
Python 3.6.9
PyTorch 1.2.0

You are incorrectly trying taking the output from the last cell by indexing the tensor at the end using lstm_out[-1]. Since lstm_out has the shape (batch, seq_len, num_directions * hidden_size), that means you need to index the second dimension of this tensor and not the first one.

So, the correct indexing should be lstm_out[:,-1,:]. After passing this through linear and relu, you will get a 2D tensor and so you need to artificially add a new dimension by calling y_pred[:,-1,:].unsqueeze(1) to add a new dimension at the second position. So, now your y_pred will have shape (batch, 1, num_directions * hidden_size).

As a rule of thumb, whenever you slice a tensor using a single value index like lstm[:,-1,:] and not a range of indices like lstm[:,:-1,:] (so I am grabbing all tensors from the second dimension except the last one as opposed to in the first one where I am only grabbing the last tensor at the second dimension), you should always remember the dimension of the lstm will decrease by 1 (you lose the second dimension) in the first case and will remain the same in the second case.

Thanks Sarim! That helped get me past this issue and got it working.

I am using following code for 2cnn+lstm model, plz guide me how to use permute function for lstm input , The 2dcnn output is a 2 d array , right? How to give 3 dim inout to this lstm , where apart from batch size whats is important is sequence on which lstm operation is to b applied. The last two dimension of 2 dcnn is the size of spectrogram , so may be the input to lstm is [ batch_size, no of filters, mxn] where mxn is the size of spectrogram.
I dnt know how to resize this 4 d arrasy like this.
plz guide

class CNN_spec(torch.nn.Module):
     def __init__(self, num_classes=7):
        super(CNN_spec, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 64, kernel_size=3,stride=1),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.layer2 = nn.Sequential(
            nn.Conv2d(64, 64, kernel_size=3,stride=1),
            nn.MaxPool2d(kernel_size=4, stride=4))
        self.layer3 = nn.Sequential(
            nn.Conv2d(64, 128, kernel_size=3,stride=1),
            nn.MaxPool2d(kernel_size=4, stride=4))
        self.layer4 = nn.Sequential(
            nn.Conv2d(128, 128, kernel_size=3,stride=1),
            nn.MaxPool2d(kernel_size=4, stride=4))
        self.layer5 = nn.LSTM(128,1000 )
        self.emotion_layer = nn.Linear(2000,num_classes)
        #self.layer5 = nn.Linear()
        # self.fc1 = nn.Linear(1728,1000)
        # self.fc2 = nn.Linear(1000,512)
        # self.fc3 = nn.Linear(512,64)
        #self.emotion_layer = nn.Linear(17664,num_classes)

     def forward(self,inputs): 
         out = self.layer1(inputs)
         out = self.layer2(out)
         out = self.layer3(out)
         out = self.layer4(out)
         out = out.permute(0, 2, ...........)
         out, (final_hidden_state, final_cell_state) = self.layer5(out)

        #out = out[:, -1, :].reshape(out.shape[0], 1, out.shape[2])

         mean = torch.mean(out, 1)
         std = torch.std(out, 1)
         stat =, std), 1)
         pred_emo = self.emotion_layer(stat)
         return pred_emo
I have left it blank in code above.