LSTM Won't Learn

skiddles · November 27, 2018, 7:17pm

I am trying to create an LSTM model that to predict some time series data.
My data is configured as follows:

Input instances:
[0.99, 0.98, 1.01, 1.03, 1.001, 0.98, 1.001]

Target Value:
[0.995]

So each instance is a vector of seven values used to predict the target.

I am trying to keep my model simple to start so here are the components"

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.lstm1 = nn.LSTM(input_size=7, hidden_size=51, num_layers=1)
        self.linear = nn.Linear(in_features=51, out_features=1)

    def forward(self, input, future = 0):
        outputs = []
        h_t = torch.rand(input.size(1)*1, 1, 51, dtype=torch.double)
        c_t = torch.rand(input.size(1)*1, 1, 51, dtype=torch.double)

        for i, input_t in enumerate(input.chunk(input.size(1), dim=1)):
            h_t, c_t = self.lstm1(input_t, (h_t, c_t))
            output = self.linear(h_t)
            output = output.add(1e-8)
            outputs += [output]

        for i in range(future):
            h_t, c_t = self.lstm1(output, (h_t, c_t))
            output = self.linear(h_t)
            outputs += [output]

        outputs = torch.stack(outputs, 1).squeeze(2)
        return outputs

This is called from the following:

    ...
    traindataloader = DataLoader(train_data,
                                 batch_size=500,
                                 shuffle=True,
                                 num_workers=4)

    model = MyModel().double()
    criterion = nn.MSELoss(reduction='sum') # I have also tried 'none' and 'elementwise_mean'
    optimizer = optim.Adam(model.parameters(), lr=1e-06, weight_decay=0.1)

    for i in range(10):
        optimizer.zero_grad()
        print(f"{Fore.BLUE}STEP: ", i, f"{Style.RESET_ALL}")

        inputs = None
        targets = None
        for idx, data in enumerate(traindataloader):
            def closure():
                y_pred = model(inputs)
                loss = criterion(y_pred, targets)
                print('Loss:', loss.item())
                loss.backward()
                return loss
            inputs, targets = data
            optimizer.step(closure)

Using this code, I can get through one mini batch of inputs before the loss goes to inf and then to nan.

My questions are:

Is my data structured correctly for an LSTM?
Is my model correct / appropriate?
Is the DataLoader used correctly? (It seems to be)

From my research, it appears I have an exploding gradient, and I have tried several ways to resolve this, but I cannot find the right combination.

I am running this on PyTorch 0.41/Python 3.6

Any help would be greatly appreciated.

skiddles · November 27, 2018, 7:56pm

So, often the simple answer is “garbage in, garbage out”

I found I had a small population of -inf's in my target variable.

Thanks to anyone who took the time to review this.

Still open to any input regarding my three questions.