Problem multi layers of LSTM output error

I am a beginner of the PyTorch, now I am writing a code for the time series forecasting by LSTM. The LSTM includes two layers and stacks together. The first layer of LSTM includes 10 LSTM units and the hidden units will pass to another layer of LSTM which includes a single LSTM unit. The following is the architecture diagram of the neural network.

following is my code,

def __init__(self, nb_features=1, hidden_size_1=100, hidden_size_2=100, nb_layers_1 =5, nb_layers_2 = 1, dropout=0.4): #(self, nb_features=1, hidden_size=100, nb_layers=10, dropout=0.5):      ####### nb_layers=5
    super(Sequence, self).__init__()
    self.nb_features = nb_features
    self.hidden_size_1 = hidden_size_1
    self.hidden_size_2 = hidden_size_2
    self.nb_layers_1 =nb_layers_1
    self.nb_layers_2 = nb_layers_2
    self.lstm_1 = nn.LSTM(self.nb_features, self.hidden_size_1, self.nb_layers_1, dropout=dropout)  #, dropout=dropout
    self.lstm_2 = nn.LSTM(self.hidden_size_1, self.hidden_size_2, self.nb_layers_2, dropout=dropout)
    self.lin = nn.Linear(self.hidden_size_2, 1)

def forward(self, input):
    h0 = Variable(torch.zeros(self.nb_layers_1, input.size()[1], self.hidden_size_1))
    h1 = Variable(torch.zeros(self.nb_layers_2, input.size()[1], self.hidden_size_2))
    c0 = Variable(torch.zeros(self.nb_layers_1, input.size()[1], self.hidden_size_1))
    c1 = Variable(torch.zeros(self.nb_layers_2, input.size()[1], self.hidden_size_2))

    output_0, hn_0 = self.lstm_1(input, (h0, c0))
    output, hn = self.lstm_2(output_0, (h1, c1))
    out = torch.tanh(self.lin(output[-1])) ##########out = self.lin(output[-1])
    #out = self.lin(output_2[-1])
    return out

The code can be run, however, the output is a straight line even tuning the hyperparameter (learning rate, dropout, activation method) and increases epoch (i.e. 3000 epochs), the output result was shown in the following.

Could you please give me some suggestions to solve this problem. many thanks


I think the problem is because the forward function keeps calling the initialization of h0, h1, c0, and c1. So, in this case, you always use vector zeros as the hidden state. Try to remove it from the forward function.

Also, I don’t think that initialization is necessary. As mentioned in the documentation:

If (h_0, c_0) is not provided, both h_0 and c_0 default to zero.


Many thanks for your reply.

I have tried your suggestion that changed the code from:
h0 = Variable(torch.zeros(self.nb_layers_1, input.size()[1], self.hidden_size_1))


h0 = torch.ones(self.nb_layers_1, input.size()[1], self.hidden_size_1)

for both h0, h1, c0 and c1, however, the result was also shown a straight line.

Additional information, if I just use one layer of LSTM with 5 LSTM units, using the same code(by mean that initialization both the hidden state and cell state as zeros), it can be successfully predicted, but add one more single unit LSTM layer it doesn’t.

May be for the second LSTM, you need to forward the hidden state of your first LSTM, instead of initializing a new hidden state.

Hi,I have a problem similar to yours. Have you found a way to solve the problem