LSTM Regressor Model Not Learning

I’m using an LSTM model for a regression task with sequential data, but it tends to overfit without a learning rate scheduler or regularization. I’ve tried different loss functions and optimizers, but no luck. Any tips to fix this?

Model:

class LSTMModel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers, bidirectional=False ):
        super(LSTMModel, self).__init__()
        self.num_layers = num_layers
        self.hidden_size = hidden_size
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers=num_layers,batch_first=True, bidirectional=bidirectional)
        if bidirectional: hidden_size *= 2
        #self.dropout = nn.Dropout(p=0.3)
        self.fc1 = nn.Linear(hidden_size, output_size)
       
        
        
        
    def forward(self, x):
        batch_size = x.shape[0]
        h, c = torch.zeros(self.num_layers, batch_size, self.hidden_size).to(x.device).requires_grad_(), torch.zeros(self.num_layers, batch_size, self.hidden_size).to(x.device).requires_grad_()
        out, (h, c) = self.lstm(x, (h, c))
        # print(out[:,-1,:])
        out = self.fc1(out[:,-1,:])
             
        return out
input_size = X_train.shape[2]
hidden_size = 50
output_size = 1
num_layers  = 2

model = LSTMModel(input_size, hidden_size, output_size, num_layers)
model = model.cuda()

# Loss and optimizer

criterion = nn.MSELoss().cuda()
# criterion = nn.L1Loss().cuda()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-2)
# optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
scheduler = ReduceLROnPlateau(optimizer, mode='min', factor=0.7, patience=5, verbose=True)

def train():
    global patience
    current_patience = 0
    best_val_loss = float('inf')
    for epoch in range(num_epochs):
        start_time = time.time()
        totloss = 0; losses =[]
        for i, (inputs, labels) in enumerate(train_loader):
            inputs, labels = inputs.cuda(), labels.cuda()
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs.squeeze(), labels)
            loss.backward()
            loss = loss.item()
            totloss +=loss; losses.append(loss)
            # Clip gradients to prevent exploding gradients
            # nn.utils.clip_grad_value_(model.parameters(), clip_value=1.0) 
            optimizer.step()
           # Validate the model on the validation set
        if (epoch + 1) % 2 ==0:
            model.eval()
            total_val_loss = 0.0
            with torch.no_grad():
                for i, (val_inputs, val_labels) in enumerate(valid_loader):
                    val_inputs, val_labels = val_inputs.cuda(), val_labels.cuda()
                    val_outputs = model(val_inputs)
                    # normalised_val_labels = (val_labels - min_value) / (max_value - min_value)
                    val_loss = criterion(val_outputs.squeeze(), val_labels)
                    total_val_loss += val_loss.item()
                    
                    # Update learning rate based on validation loss
                    scheduler.step(val_loss)
            

train.log

Split files found. Loading data.
Data loaded successfully.
Shape of X_train: (630, 100, 26)
Shape of X_valid: (135, 100, 26)
Shape of X_test: (136, 100, 26)
Shape of y_train: (630,)
Shape of y_valid: (135,)
Shape of y_test: (136,)
Mean : 18.67, Standard Deviation : 7.22
Epoch [2/3000], Training Loss: 52.8145, Validation Loss: 62.8534
Best model saved.
Epoch [4/3000], Training Loss: 54.2834, Validation Loss: 57.7704
Best model saved.
Epoch [6/3000], Training Loss: 54.1239, Validation Loss: 59.5157
Epoch [8/3000], Training Loss: 52.4262, Validation Loss: 54.5567
Best model saved.
Epoch [10/3000], Training Loss: 53.2546, Validation Loss: 54.4743
Best model saved.
Epoch [12/3000], Training Loss: 51.1305, Validation Loss: 52.9499
Best model saved.
Epoch [14/3000], Training Loss: 51.1605, Validation Loss: 56.7411
Epoch [16/3000], Training Loss: 52.2155, Validation Loss: 55.3776
Epoch 00017: reducing learning rate of group 0 to 7.0000e-03.
Epoch [18/3000], Training Loss: 51.3516, Validation Loss: 57.6882
Epoch [20/3000], Training Loss: 50.6309, Validation Loss: 53.5289
Epoch [22/3000], Training Loss: 49.7696, Validation Loss: 72.6590
Epoch [24/3000], Training Loss: 49.9311, Validation Loss: 54.8420
Epoch [26/3000], Training Loss: 49.9528, Validation Loss: 55.5299
Epoch [28/3000], Training Loss: 49.4594, Validation Loss: 67.5311
Epoch [30/3000], Training Loss: 48.2770, Validation Loss: 54.2181
Epoch [32/3000], Training Loss: 50.1973, Validation Loss: 54.9921
Epoch [34/3000], Training Loss: 49.3237, Validation Loss: 54.6661
Epoch 00035: reducing learning rate of group 0 to 4.9000e-03.
Epoch [36/3000], Training Loss: 49.5121, Validation Loss: 54.4325
Epoch [38/3000], Training Loss: 48.8125, Validation Loss: 53.
...
Epoch [576/3000], Training Loss: 32.1501, Validation Loss: 63.1680
Epoch [578/3000], Training Loss: 31.3433, Validation Loss: 63.1682
Epoch [580/3000], Training Loss: 31.3847, Validation Loss: 63.1684
Epoch [582/3000], Training Loss: 32.1502, Validation Loss: 63.1682
Epoch [584/3000], Training Loss: 31.6402, Validation Loss: 63.1681
Epoch [586/3000], Training Loss: 31.6374, Validation Loss: 63.1682
Epoch [588/3000], Training Loss: 31.5430, Validation Loss: 63.1683
Epoch [590/3000], Training Loss: 31.7007, Validation Loss: 63.1686
Epoch [592/3000], Training Loss: 31.4951, Validation Loss: 63.1684
Epoch [594/3000], Training Loss: 31.7434, Validation Loss: 63.1682
Epoch [596/3000], Training Loss: 31.5290, Validation Loss: 63.1685
Epoch [598/3000], Training Loss: 32.3261, Validation Loss: 63.1685
Epoch [600/3000], Training Loss: 32.1046, Validation Loss: 63.1685
Epoch [602/3000], Training Loss: 31.5515, Validation Loss: 63.1685
Epoch [604/3000], Training Loss: 31.6082, Validation Loss: 63.1684
Epoch [606/3000], Training Loss: 31.9585, Validation Loss: 63.1682
Epoch [608/3000], Training Loss: 31.4290, Validation Loss: 63.1681
Epoch [610/3000], Training Loss: 31.8012, Validation Loss: 63.1681
Epoch [612/3000], Training Loss: 31.6929, Validation Loss: 63.1681
Early stopping at epoch 612 as the validation loss has not improved.