In LSTM which layer should I use as output?

DonghunP · May 14, 2021, 6:09am

Hi.

I recently refer two good reference. But I confused that output layer of these two is different.

First reference return ‘output’ term time-series-prediction-using-lstm

        self.hidden_cell = (torch.zeros(1,1,self.hidden_layer_size), torch.zeros(1,1,self.hidden_layer_size))
    def forward(self, input_seq):
        lstm_out, self.hidden_cell = self.lstm(input_seq.view(len(input_seq) ,1, -1), self.hidden_cell)
        predictions = self.linear(lstm_out.view(len(input_seq), -1))
        return predictions[-1]
       # Size
       # lstm_out          : [Seq_len(12), batch_size(1), 100]
       # predictions      : [Seq_len(12), 1]
       # predictions[-1] : (1,)

But,

Second reference return ‘hidden_cell’ term Time_Series_Prediction_with_LSTM

    def forward(self, x):
        h_0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size))
        c_0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size))

        # Propagate input through LSTM
        ula, (h_out, _) = self.lstm(x, (h_0, c_0))
        h_out = h_out.view(-1, self.hidden_size) 
        out = self.fc(h_out) 
        return out

What is difference between these two?

Thanks ahead.

torch.nn.LSTM

Input
input (seq_len, batch, input_size)
h_0 (num_layers * num_directions, batch, hidden_size)
c_0 (num_layers * num_directions, batch, hidden_size)
output
output(seq_len, batch, num_directions * hidden_size)
h_n(num_layers * num_directions, batch, hidden_size)
c_n(num_layers * num_directions, batch, hidden_size)

vdw · May 15, 2021, 7:24am

As you can see from the documentation, lstm_out and ula the two forward methods contain the last hidden states for all time steps (i.e., all items in your sequence). Note that “last” refers to the hidden state with respect to the number of layers and not with respect to the number if time steps.

In contrast, h_out (or self.hidden_cell[0]) refers to the last hidden states with respect to the number of time steps. It includes the last hidden states for all layers in case num_layers > 1.

No solution is fundamentally correct or wrong, I would argue that the second using h_out is more common for basic time series prediction. Strictly speaking, I don’t like both implementation since the use view() in a way that can quickly cause issues.

Here is what I would do:

    def forward(self, x):
        h_0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size))
        c_0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size))

        # Propagate input through LSTM
        ula, (h_out, _) = self.lstm(x, (h_0, c_0))
        #  Split num_layers and num_directions (useful if you LSTM is bidirectional)
        # This view is directly taken from the docs
        h_out = h_out.view(self.num_layers, self.num_directions, self.batch, self.hidden_size)
        # Get the last layer with respect to num_layers
        h_out = h_out[-1]
        # Handle num_directions dimension (I assume here that bidirectional=False)
        h_out = h_out.squeeze(0)
        # Now the shape of h_out is (batch, hidden_size)
        out = self.fc(h_out) 
        return out

DonghunP · May 17, 2021, 9:22am

Thank you Chris. I totally understand.
I get help from you many times.

Have a good day