I’m using an LSTM model for a regression task with sequential data, but it tends to overfit without a learning rate scheduler or regularization. I’ve tried different loss functions and optimizers, but no luck. Any tips to fix this?
Model:
class LSTMModel(nn.Module):
def __init__(self, input_size, hidden_size, output_size, num_layers, bidirectional=False ):
super(LSTMModel, self).__init__()
self.num_layers = num_layers
self.hidden_size = hidden_size
self.lstm = nn.LSTM(input_size, hidden_size, num_layers=num_layers,batch_first=True, bidirectional=bidirectional)
if bidirectional: hidden_size *= 2
#self.dropout = nn.Dropout(p=0.3)
self.fc1 = nn.Linear(hidden_size, output_size)
def forward(self, x):
batch_size = x.shape[0]
h, c = torch.zeros(self.num_layers, batch_size, self.hidden_size).to(x.device).requires_grad_(), torch.zeros(self.num_layers, batch_size, self.hidden_size).to(x.device).requires_grad_()
out, (h, c) = self.lstm(x, (h, c))
# print(out[:,-1,:])
out = self.fc1(out[:,-1,:])
return out
input_size = X_train.shape[2]
hidden_size = 50
output_size = 1
num_layers = 2
model = LSTMModel(input_size, hidden_size, output_size, num_layers)
model = model.cuda()
# Loss and optimizer
criterion = nn.MSELoss().cuda()
# criterion = nn.L1Loss().cuda()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-2)
# optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
scheduler = ReduceLROnPlateau(optimizer, mode='min', factor=0.7, patience=5, verbose=True)
def train():
global patience
current_patience = 0
best_val_loss = float('inf')
for epoch in range(num_epochs):
start_time = time.time()
totloss = 0; losses =[]
for i, (inputs, labels) in enumerate(train_loader):
inputs, labels = inputs.cuda(), labels.cuda()
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs.squeeze(), labels)
loss.backward()
loss = loss.item()
totloss +=loss; losses.append(loss)
# Clip gradients to prevent exploding gradients
# nn.utils.clip_grad_value_(model.parameters(), clip_value=1.0)
optimizer.step()
# Validate the model on the validation set
if (epoch + 1) % 2 ==0:
model.eval()
total_val_loss = 0.0
with torch.no_grad():
for i, (val_inputs, val_labels) in enumerate(valid_loader):
val_inputs, val_labels = val_inputs.cuda(), val_labels.cuda()
val_outputs = model(val_inputs)
# normalised_val_labels = (val_labels - min_value) / (max_value - min_value)
val_loss = criterion(val_outputs.squeeze(), val_labels)
total_val_loss += val_loss.item()
# Update learning rate based on validation loss
scheduler.step(val_loss)
train.log
Split files found. Loading data.
Data loaded successfully.
Shape of X_train: (630, 100, 26)
Shape of X_valid: (135, 100, 26)
Shape of X_test: (136, 100, 26)
Shape of y_train: (630,)
Shape of y_valid: (135,)
Shape of y_test: (136,)
Mean : 18.67, Standard Deviation : 7.22
Epoch [2/3000], Training Loss: 52.8145, Validation Loss: 62.8534
Best model saved.
Epoch [4/3000], Training Loss: 54.2834, Validation Loss: 57.7704
Best model saved.
Epoch [6/3000], Training Loss: 54.1239, Validation Loss: 59.5157
Epoch [8/3000], Training Loss: 52.4262, Validation Loss: 54.5567
Best model saved.
Epoch [10/3000], Training Loss: 53.2546, Validation Loss: 54.4743
Best model saved.
Epoch [12/3000], Training Loss: 51.1305, Validation Loss: 52.9499
Best model saved.
Epoch [14/3000], Training Loss: 51.1605, Validation Loss: 56.7411
Epoch [16/3000], Training Loss: 52.2155, Validation Loss: 55.3776
Epoch 00017: reducing learning rate of group 0 to 7.0000e-03.
Epoch [18/3000], Training Loss: 51.3516, Validation Loss: 57.6882
Epoch [20/3000], Training Loss: 50.6309, Validation Loss: 53.5289
Epoch [22/3000], Training Loss: 49.7696, Validation Loss: 72.6590
Epoch [24/3000], Training Loss: 49.9311, Validation Loss: 54.8420
Epoch [26/3000], Training Loss: 49.9528, Validation Loss: 55.5299
Epoch [28/3000], Training Loss: 49.4594, Validation Loss: 67.5311
Epoch [30/3000], Training Loss: 48.2770, Validation Loss: 54.2181
Epoch [32/3000], Training Loss: 50.1973, Validation Loss: 54.9921
Epoch [34/3000], Training Loss: 49.3237, Validation Loss: 54.6661
Epoch 00035: reducing learning rate of group 0 to 4.9000e-03.
Epoch [36/3000], Training Loss: 49.5121, Validation Loss: 54.4325
Epoch [38/3000], Training Loss: 48.8125, Validation Loss: 53.
...
Epoch [576/3000], Training Loss: 32.1501, Validation Loss: 63.1680
Epoch [578/3000], Training Loss: 31.3433, Validation Loss: 63.1682
Epoch [580/3000], Training Loss: 31.3847, Validation Loss: 63.1684
Epoch [582/3000], Training Loss: 32.1502, Validation Loss: 63.1682
Epoch [584/3000], Training Loss: 31.6402, Validation Loss: 63.1681
Epoch [586/3000], Training Loss: 31.6374, Validation Loss: 63.1682
Epoch [588/3000], Training Loss: 31.5430, Validation Loss: 63.1683
Epoch [590/3000], Training Loss: 31.7007, Validation Loss: 63.1686
Epoch [592/3000], Training Loss: 31.4951, Validation Loss: 63.1684
Epoch [594/3000], Training Loss: 31.7434, Validation Loss: 63.1682
Epoch [596/3000], Training Loss: 31.5290, Validation Loss: 63.1685
Epoch [598/3000], Training Loss: 32.3261, Validation Loss: 63.1685
Epoch [600/3000], Training Loss: 32.1046, Validation Loss: 63.1685
Epoch [602/3000], Training Loss: 31.5515, Validation Loss: 63.1685
Epoch [604/3000], Training Loss: 31.6082, Validation Loss: 63.1684
Epoch [606/3000], Training Loss: 31.9585, Validation Loss: 63.1682
Epoch [608/3000], Training Loss: 31.4290, Validation Loss: 63.1681
Epoch [610/3000], Training Loss: 31.8012, Validation Loss: 63.1681
Epoch [612/3000], Training Loss: 31.6929, Validation Loss: 63.1681
Early stopping at epoch 612 as the validation loss has not improved.