Getting nan as prediction from LSTM

smu226 · March 29, 2019, 11:29pm

Hello! I build a simple bidirectional LSTM:

class LC_LSTM(nn.Module):
    def __init__(self, nl):
        super().__init__()
        self.nl = nl
        self.rnn = nn.LSTM(1, n_hidden, nl, bidirectional=True) #dropout=0.3,bidirectional=True)
        self.l_out = nn.Linear(n_hidden*2, n_classes)
        self.init_hidden(bs)
        
    def forward(self, input):
        outp,h = self.rnn(input.view(len(input), bs, -1), self.h)
        return F.log_softmax(self.l_out(outp),dim=2)
    
    def init_hidden(self, bs):
        self.h = (V(torch.zeros(self.nl*2, bs, n_hidden)),
                  V(torch.zeros(self.nl*2, bs, n_hidden)))

And I want to pass some time series data to it. I wanted to apply it to one time series, before training, just to make sure it works, but I am getting only nan as outputs. The size of the time series is 3426 and bs=1. However, if I pass only a smaller part of the time series, say, the first 500 values, the code seems to work i.e. the output of the LSTM are actual numbers. Does it have to do with the fact that 3426 values passed at once is too much? Shouldn’t I get an out of memory error, instead of just nan’s? Thank you!

kamathhrishi · March 29, 2019, 11:58pm

I don’t have much experience with LSTM’s but the problem could be either the learning rate was too high or the fact that you intialized your network with an constant (zeros) so your network won’t backpropagate through it. Initialize it with random numbers.

smu226 · March 30, 2019, 12:15am

Thank you for your reply. I am doing this test before starting the training, so the learning rate doesn’t come intro play at all here. I do indeed initialize the hidden layer as zeros, but why does it work with 500 inputs but not with 3426? But again, I am not doing backpropagation here, I am just testing the untrained model.

kamathhrishi · March 30, 2019, 1:22am

Interesting. So like I mentioned I am not exactly sure , but I will try my best if you could send the code thats enough to reproduce the result.