In LSTM which layer should I use as output?

vdw · May 15, 2021, 7:24am

As you can see from the documentation, lstm_out and ula the two forward methods contain the last hidden states for all time steps (i.e., all items in your sequence). Note that “last” refers to the hidden state with respect to the number of layers and not with respect to the number if time steps.

In contrast, h_out (or self.hidden_cell[0]) refers to the last hidden states with respect to the number of time steps. It includes the last hidden states for all layers in case num_layers > 1.

No solution is fundamentally correct or wrong, I would argue that the second using h_out is more common for basic time series prediction. Strictly speaking, I don’t like both implementation since the use view() in a way that can quickly cause issues.

Here is what I would do:

    def forward(self, x):
        h_0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size))
        c_0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size))

        # Propagate input through LSTM
        ula, (h_out, _) = self.lstm(x, (h_0, c_0))
        #  Split num_layers and num_directions (useful if you LSTM is bidirectional)
        # This view is directly taken from the docs
        h_out = h_out.view(self.num_layers, self.num_directions, self.batch, self.hidden_size)
        # Get the last layer with respect to num_layers
        h_out = h_out[-1]
        # Handle num_directions dimension (I assume here that bidirectional=False)
        h_out = h_out.squeeze(0)
        # Now the shape of h_out is (batch, hidden_size)
        out = self.fc(h_out) 
        return out