LSTM - Predicting the same constant values after a while

hzzzm · August 2, 2020, 3:33pm

I build up a module using nn.LSTM to predicate same thing. but it predict same values after 5-10 predictions:

class NET(nn.Module):
    def __init__(self, input_size=len(train_cols), hidden_size=40, output_size=2, num_layer=10):
        super(NET, self).__init__()
        self.rnn = nn.LSTM(input_size, hidden_size, num_layer)
        self.out = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        out, _ = self.rnn(x)
        out = self.out(out[:, -1, :])
        return out
net = NET(output_size=1)
optimizer = torch.optim.Adam(net.parameters(), lr=0.08, betas=(0.9, 0.999), eps=1e-08, weight_decay=0)
loss_func = torch.nn.MSELoss()
test_x = test_x.reshape(-1, 1, len(train_cols))
test_x = torch.from_numpy(test_x)
test_xv = Variable(test_x)
for epoch in range(100):
    var_x = Variable(train_x).type(torch.FloatTensor)
    var_y = Variable(train_y).type(torch.FloatTensor)
    out = net(var_x)
    loss = loss_func(out.squeeze(), var_y.squeeze())
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    if (epoch + 1) % 5 == 0:
        print('Epoch: {}, Loss: {:.5f}'.format(epoch + 1, loss.data.numpy()))
        pred_test = net(test_xv)
        pred_test = pred_test.view(-1).data.numpy()
        viz.text(str(np.dstack((pred_test,test_y))), win='compare',
                 opts=dict(title='pred'))

The prediction is:[[[0.90698224 0.97657084] [1.3066663 0.98598385] [1.4000531 1.003238 ] [1.4139748 1.018212 ] [1.4166933 1.0115894 ] [1.4169703 0.99727374] [1.4168639 0.9989065 ] [1.4166994 0.9569661 ] [1.4165486 0.96585363] [1.4164208 0.96672124] [1.416315 0.9726626 ] [1.4162277 0.9950739 ] [1.4161557 1.0281057 ] [1.4160964 1.0196408 ] [1.4160475 1.0191873 ] [1.4160074 1.0168635 ] [1.4159743 1.0126513 ] [1.4159468 1.004374 ] [1.4159242 1.0187122 ] [1.4159056 1.0370985 ] [1.4158903 1.0679934 ] [1.4158777 1.0439979 ] [1.4158672 1.044638 ] [1.4158586 1.0534846 ] [1.4158515 1.0213561 ] [1.4158456 0.97619045] [1.4158409 0.97762746] [1.4158368 0.9567483 ] [1.4158336 0.93333334] [1.4158307 0.9560899 ]]]
Who can tell me why?

Unity05 · August 2, 2020, 9:50pm

Hi @hzzzm,

do you pad your input sequences? If yes, I guess that comes from LSTM’s nature, even tho this might occur more obviously when working with GRUs.

When the input becomes 0 (that’s what padding does with value 0), the added up previous hidden and current input is smaller than ‘normal’ (with ‘normal’ inputs). Therefore, the first sigmoid’s output will be lower and therefore the cell state will forget more. However, that might not even be the biggest issue. By looking at the input gate, we can see, that the zero - centred tanh multiplied with the sigmoid’s output adds less to the cell state than normally. The same goes for the output gate, as the output gate’s sigmoid function outputs lower values than ‘normally’. Therefore, even the hidden state gets closer and closer to zero, as well as the cell state with it. Furthermore, all these operations are accompanied by weights, which are supporting this convergence.

Anybody, please feel free to correct me.

Regards,
Unity05

vdw · August 2, 2020, 11:38pm

Since you define your LSTM with the default parameter batch_first=False, the output has the shape (seq_len, batch, hidden_size). That means that out[:, -1, :] gives you the values for the hidden states of all the time steps for the last item in your batch, i.e., the output shape is (seq_len, hidden_dim).

What you want is the last hidden state (“last” w.r.t. to the number of time steps) for all items in your batch. You simple change that line to out[-1]. Just to be sure, and since I don’t know how your data looks like, can you change your forward() method as follows, and post the output of the print statements?

def forward(self, x):
    print(x.shape)
    out, _ = self.rnn(x)
    print(out.shape)
    out = out[:, -1, :])
    print(out.shape)
    out = self.out(out)
    return out

hzzzm · August 3, 2020, 1:58am

Thanks:

def forward(self, x):
    print('x shape',x.shape)
    out, _ = self.rnn(x)
    print('before reshape',out.shape)
    out =out[:, -1, :]
    print('after reshape',out.shape)
    out = self.out(out)
    print('output shape',out.shape)
    return out

and the result is:
x shape torch.Size([2227, 1, 14])
before reshape torch.Size([2227, 1, 40])
after reshape torch.Size([2227, 40])
output shape torch.Size([2227, 1])

vdw · August 3, 2020, 2:12am

@hzzzm thanks! I assume that 2227 is the sequence length and you only have 1 sequence in your batch, i.e., batch_size = 1. After out = out[:, -1, :], the shape is (2227, 40) which would suddenly mean a batch size of 2227.

out = out[-1] will yield a shape if (1, 40) representing (batch_size, hidden_size) which I strongly assume is what you want.

hzzzm · August 3, 2020, 4:41pm

I have removed the line: out= out[:,-1,:] from my function to keep batch_size. but it is still the same.
My data: A sequence of 2227, it has 14 properties