LSTM predicts the same constant value

Planet_Boost · December 21, 2022, 3:03pm

I want to predict one variable using 7 features with time steps of 4:

# Shape X_train: torch.Size([24433, 4, 7]
# Shape Y_train: torch.Size([24433, 4, 1]

# Shape X_test: torch.Size([6109, 4, 7]
# Shape Y_test: torch.Size([6109, 4, 1]

train_dataset = TensorDataset(X_train, Y_train)
test_dataset = TensorDataset(X_test, Y_test) 

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=32, shuffle=False)

My (initial) LSTM model:

class LSTMModel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.lstm = nn.LSTM(input_size, hidden_size)
        self.linear = nn.Linear(hidden_size, output_size)
        
    def forward(self, x):
        x, _ = self.lstm(x)
        x = self.linear(x)
        return x

model = LSTMModel(input_size=7, hidden_size=256, output_size=1)

loss_fn = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)

Apply model:

# Loop over the training set
for X, Y in train_loader:

    optimizer.zero_grad()
    
    Y_pred = model(X)

    loss = loss_fn(Y_pred, Y)
    
    loss.backward()
    
    optimizer.step()

model.eval()

# Loop over the test set
for X, Y in test_loader:

    Y_pred = model(X)
    
    loss = loss_fn(Y_pred, Y)

An example of Y (true data):

tensor([[[59.],
         [59.],
         [59.],
         [59.]],

        [[70.],
         [70.],
         [70.],
         [70.]],

        [[ 100.],
         [ 0.],
         [ 0.],
         [ 0.]],

# etc.

However, my Y_pred is somewhat like this:

 tensor([[[15.8224],
         [15.8224],
         [15.8224],
         [15.8224]],

        [[16.1654],
         [16.1654],
         [16.1654],
         [16.1654]],

        [[16.2127],
         [16.2127],
         [16.2127],
         [16.2127]],

# etc.

I have tried numerous different things:

Changing the model architecture (different batch size, different number of layers)
Adding dropout and decay parameters
Using epochs and changing the number of epochs when looping over training and test data
Different optimizers (Adam, SGD) with different learning rates
Log transforming my input data

In my (unanswered) previous I give an example of how my input data looks like.

I am fairly new with PyTorch and LSTMs so I might do it wrong, but, whatever I change, I keep getting a (near) constant value from the predictions. What am I doing wrong/what should I be doing?

vdw · December 21, 2022, 3:09pm

If this is your shape of the input, you probably should define your LSTM with batch_first=True.

Planet_Boost · December 21, 2022, 4:47pm

@vdw
Thanks for your answer. I’ve changed

self.lstm = nn.LSTM(input_size, hidden_size)

to

self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)

Or should I do more?
The predictions now are different than first, but the same ‘problem’ (same values) does seem to persist:

# Y (true data):
tensor([[[59.],
         [59.],
         [59.],
         [59.]],

        [[70.],
         [70.],
         [70.],
         [70.]],

        [[ 0.],
         [ 0.],
         [ 0.],
         [ 0.]],

        [[ 0.],
         [ 0.],
         [ 0.],
         [ 0.]],

        [[ 3.],
         [ 3.],
         [ 3.],
         [ 3.]]]) 

# Y_pred (predicted data):
tensor([[[20.2832],
         [17.4102],
         [16.9698],
         [16.9091]],

        [[20.2832],
         [17.4102],
         [16.9698],
         [16.9091]],

        [[20.2832],
         [17.4102],
         [16.9698],
         [16.9091]],

        [[20.2832],
         [17.4102],
         [16.9698],
         [16.9091]],

        [[20.2832],
         [17.4102],
         [16.9698],
         [16.9091]]], grad_fn=<AddBackward0>)

```

Planet_Boost · December 21, 2022, 7:51pm

Coming back to tell that I somewhat solved this by normalizing my input data. I now obtain different predictions for every output. Whether they are good or not is something I have to figure out!

Md_Zahidul_Islam · December 17, 2023, 10:47pm

Your true Y is also same but different values. I am wondering why do you need different preds then?