LSTM predicts the same constant value

I want to predict one variable using 7 features with time steps of 4:

# Shape X_train: torch.Size([24433, 4, 7]
# Shape Y_train: torch.Size([24433, 4, 1]

# Shape X_test: torch.Size([6109, 4, 7]
# Shape Y_test: torch.Size([6109, 4, 1]

train_dataset = TensorDataset(X_train, Y_train)
test_dataset = TensorDataset(X_test, Y_test) 

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=32, shuffle=False)

My (initial) LSTM model:

class LSTMModel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.lstm = nn.LSTM(input_size, hidden_size)
        self.linear = nn.Linear(hidden_size, output_size)
        
    def forward(self, x):
        x, _ = self.lstm(x)
        x = self.linear(x)
        return x

model = LSTMModel(input_size=7, hidden_size=256, output_size=1)

loss_fn = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)

Apply model:

# Loop over the training set
for X, Y in train_loader:

    optimizer.zero_grad()
    
    Y_pred = model(X)

    loss = loss_fn(Y_pred, Y)
    
    loss.backward()
    
    optimizer.step()

model.eval()

# Loop over the test set
for X, Y in test_loader:

    Y_pred = model(X)
    
    loss = loss_fn(Y_pred, Y)

An example of Y (true data):

tensor([[[59.],
         [59.],
         [59.],
         [59.]],

        [[70.],
         [70.],
         [70.],
         [70.]],

        [[ 100.],
         [ 0.],
         [ 0.],
         [ 0.]],

# etc.

However, my Y_pred is somewhat like this:

 tensor([[[15.8224],
         [15.8224],
         [15.8224],
         [15.8224]],

        [[16.1654],
         [16.1654],
         [16.1654],
         [16.1654]],

        [[16.2127],
         [16.2127],
         [16.2127],
         [16.2127]],

# etc.

I have tried numerous different things:

  • Changing the model architecture (different batch size, different number of layers)
  • Adding dropout and decay parameters
  • Using epochs and changing the number of epochs when looping over training and test data
  • Different optimizers (Adam, SGD) with different learning rates
  • Log transforming my input data

In my (unanswered) previous I give an example of how my input data looks like.

I am fairly new with PyTorch and LSTMs so I might do it wrong, but, whatever I change, I keep getting a (near) constant value from the predictions. What am I doing wrong/what should I be doing?

If this is your shape of the input, you probably should define your LSTM with batch_first=True.

@vdw
Thanks for your answer. I’ve changed

self.lstm = nn.LSTM(input_size, hidden_size)

to

self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)

Or should I do more?
The predictions now are different than first, but the same ‘problem’ (same values) does seem to persist:

# Y (true data):
tensor([[[59.],
         [59.],
         [59.],
         [59.]],

        [[70.],
         [70.],
         [70.],
         [70.]],

        [[ 0.],
         [ 0.],
         [ 0.],
         [ 0.]],

        [[ 0.],
         [ 0.],
         [ 0.],
         [ 0.]],

        [[ 3.],
         [ 3.],
         [ 3.],
         [ 3.]]]) 

# Y_pred (predicted data):
tensor([[[20.2832],
         [17.4102],
         [16.9698],
         [16.9091]],

        [[20.2832],
         [17.4102],
         [16.9698],
         [16.9091]],

        [[20.2832],
         [17.4102],
         [16.9698],
         [16.9091]],

        [[20.2832],
         [17.4102],
         [16.9698],
         [16.9091]],

        [[20.2832],
         [17.4102],
         [16.9698],
         [16.9091]]], grad_fn=<AddBackward0>)

```

Coming back to tell that I somewhat solved this by normalizing my input data. I now obtain different predictions for every output. Whether they are good or not is something I have to figure out!